(PR) AMD Instinct MI355X GPUs Surpass 1M Tokens/Sec in MLPerf 6.0
In its MLPerf Inference 6.0 submission, AMD did not simply revisit familiar benchmarks with a faster GPU. It expanded into first-time workloads, crossed the 1-million-tokens-per-second threshold at multinode scale and showed that partners can reproduce the results across a broader ecosystem. That combination matters because our customers no longer evaluate inference platforms on one metric alone. They want competitive single-node performance, efficient scale-out, faster bring-up on new models, reproducible results across partner systems and confidence that the software stack can keep pace. MLPerf Inference 6.0 let us show all of that in one submission.
Just as important, we showed that these results are not isolated. A broad partner ecosystem submitted across four AMD Instinct GPU types that closely reproduced numbers submitted by AMD and the first three-GPU heterogeneous MLPerf submission demonstrated that AMD hardware and AMD ROCm software can orchestrate meaningful inference throughput even across systems in different geographies.
Just as important, we showed that these results are not isolated. A broad partner ecosystem submitted across four AMD Instinct GPU types that closely reproduced numbers submitted by AMD and the first three-GPU heterogeneous MLPerf submission demonstrated that AMD hardware and AMD ROCm software can orchestrate meaningful inference throughput even across systems in different geographies.







































































































