Loading article…
AMD’s MI355X hit 88% of Nvidia’s Blackwell B300 throughput in MLPerf 6.0, edging out a single‑node Llama2‑70B run by 4% – see why the gap still matters.
AMD’s MI355X GPU posted a 4% higher token rate than Nvidia’s Blackwell GPU on a single‑node Llama2‑70B inference test, but the broader MLPerf 6.0 suite still places Nvidia ahead overall [1]. The benchmark, run by an Nvidia partner, showed AMD closing the gap on a few workloads and reaching roughly 88% of Nvidia’s Blackwell B300 performance on a modern text‑to‑image model that missed the official submission deadline [1].
The results come from the latest round of MLPerf inference benchmarks, where AMD, Nvidia and Intel each submitted runs across “Open” and “Closed” divisions. In the Open division, AMD’s MI355X leveraged the new FP4 precision and updated ROCm software to boost token generation speed, allowing near‑linear scaling to 11 nodes (up to 96 GPUs) over Ethernet and achieving one million tokens per second [1]. However, the comparison pits AMD’s Open result against Nvidia’s Closed‑division submission, meaning the two are not directly comparable. Nvidia’s Blackwell Ultra B300 retained leadership in per‑GPU and per‑rack performance, especially on state‑of‑the‑art models like DeepSeek‑R1, where it delivered 2.5 million tokens per second across 288 GPUs—far outpacing AMD’s attempts [1].
Nvidia’s advantage stems not just from raw silicon but from aggressive software optimizations. Techniques such as disaggregated serving, Wide Expert Parallel (WideEP), Multi‑Token Prediction (MTP) and KV‑aware routing, all part of the Nvidia Dynamo stack, have lifted throughput dramatically, sometimes by up to threefold on the same hardware [1]. These advances underscore the author’s point that AI performance is now a system‑wide problem, involving networking, CPUs, and software, not merely a faster GPU. AMD’s roadmap hints at future rack‑scale solutions, but it still lacks NVLink‑class scaling and the software ecosystem that Nvidia has cultivated for years [1].
The takeaway is clear: AMD is now within striking distance of Nvidia on certain inference tasks, yet Nvidia’s integrated hardware‑software stack keeps it firmly ahead on the most demanding models. Whether AMD can translate its Open‑division gains into Closed‑division parity—and how quickly Nvidia’s upcoming Groq‑based LPX will reshape the leaderboard—remain the key questions for the AI hardware race.
Coverage is mostly measured — 48 of 61 reports stay neutral.
Every Monday — the token unlocks, Fed dates & catalysts set to move crypto and markets this week. So you’re never blindsided.
Free · 3-min read · one-click unsubscribe
AI-assisted synthesis by the TrendWatcher Editorial Desk · sourced from 2 outlets · Jun 14, 2026 · How we report
Nvidia designs its chips but relies on Taiwan Semiconductor Manufacturing (TSMC) for the actual manufacturing process.
NVLink Fusion allows third-party accelerators to communicate with Nvidia GPUs and infrastructure over a high-bandwidth, low-latency interconnect.
CEO Jensen Huang has forecast that Nvidia could reach $1 trillion in AI-related revenue by the 2027 calendar year.
No, Nvidia partners with other firms like Marvell Technology to integrate custom silicon and networking components into its broader AI infrastructure ecosystem.