Nvidia’s new GPUs are already running into problems

by Yaron

Nvidia’s latest Blackwell GPUs are running into problems in the data center, reports The Information. According to the report, Nvidia’s customers are worried about how well the AI accelerators will hold up, as overheating issues have caused delays in server racks being deployed for AI training.

The Blackwell architecture is at the heart of both Nvidia’s next-gen AI accelerators and its upcoming RTX 50-series graphics cards. In the data center, the architecture was previously delayed due to “design flaws,” pushing the deployment of the B100 and B200 GPUs back. That’s despite big orders with AI players like Meta, Microsoft, and Google.

According to the report, the big problem in the data center stems from cramming 72 of the AI accelerators together in a server rack, which has led to overheating problems. Reuters reports that Nvidia has asked suppliers to redesign the server racks “several times” in order to get around the overheating issues.

Blackwell is a signficant step for Nvidia. It’s at the heart of the next generation of GPUs, which could earn spots among the best graphics cards. Blackwell is also a point for Nvidia to cement its lead ahead of AMD. Team Red has already deployed its MI300X AI accelerator in data centers, and it’s currently rolling out its MI325X accelerator as it prepares next-gen AI chips.

Nvidia claims Blackwell is able to train large language models at 25 times lower cost and energy consumption compared to its last-gen Hopper architecture, or that it’s able to train these models up to 30 times faster. That kind of speed-up has a big impact on heat, which is already an issue that data centers need to deal with when it comes to AI accelerators.

It could have implications for RTX 50-series GPUs, too. Although we know cards like the RTX 4090 are incredibly efficient when it comes to gaming, Nvidia’s previous flagship still ran into issues with high power consumption and melting power connectors. The latest speculation is that a card like the RTX 5090 could push power requirements further, up to 600 watts. Corsair also confirmed that Nvidia’s next-gen graphics cards will stick with the 12V-2×6 connector that’s been at the center of melting issues on the RTX 4090.

Gamers won’t be cramming 72 RTX 5090s inside a PC, but the scale of overheating issues is different between a data center and a desktop PC. If the Blackwell architecture is running into these issues in the data center, it could spell trouble for Nvidia’s desktop range.

For now, all we can do is wait. Nvidia is expected to reveal its RTX 50-series GPUs in January at CES 2025. Recent reports suggest Nvidia is winding down production of its RTX 40-series cards, most likely clearing the way for next-gen options.

Related Posts

Leave a Comment