Deepseek reportedly had to fall back on Nvidia chips for new model

DeepSeek Falls Back on Nvidia Chips Amid Huawei Supply Challenges for Latest AI Model

DeepSeek, the Chinese AI startup behind highly efficient open-weight language models, has reportedly pivoted to Nvidia hardware for training its newest model after encountering significant hurdles with Huawei’s domestic chips. The company had publicly committed to relying exclusively on Huawei’s Ascend 910B processors to sidestep US export restrictions on advanced Nvidia GPUs. However, supply shortages and performance limitations forced a change in strategy, leading DeepSeek to acquire a substantial Nvidia H800 cluster.

The backstory traces back to DeepSeek’s impressive track record with models like DeepSeek-V2, which achieved strong benchmarks while using far fewer parameters than competitors such as Meta’s Llama 3. This efficiency stemmed from innovative techniques like multi-head latent attention (MLA) and deep auxiliary-loss decoding, allowing the model to rival proprietary systems on a fraction of the compute. Emboldened by this success, DeepSeek announced plans last year to train its next-generation model, tentatively DeepSeek-V3, entirely on Huawei’s Ascend chips. CEO Liang Wenfeng emphasized this shift in interviews, framing it as a step toward technological self-reliance amid escalating US-China tensions over semiconductor exports.

Huawei’s Ascend 910B, positioned as a direct competitor to Nvidia’s H100, promised comparable performance for AI training workloads. DeepSeek touted its procurement of thousands of these chips, positioning itself as a pioneer in China’s push for indigenous AI infrastructure. The plan aligned with Beijing’s broader directives to reduce dependence on Western technology, especially after the US tightened controls in 2022 and 2023, capping exports of high-end GPUs like the A100 and H100 to China.

Yet, reality diverged from ambition. Reports from Chinese tech outlets and analyst firm SemiAnalysis reveal that DeepSeek struggled to scale its Huawei-based cluster to the required size. Production bottlenecks at Huawei, compounded by the chips’ teething issues in large-scale deployments, left DeepSeek short of compute power. The Ascend 910B, while capable in benchmarks, reportedly underperformed in sustained training runs compared to Nvidia equivalents, necessitating software tweaks and yielding lower throughput.

To meet deadlines for the model’s release, DeepSeek turned to Nvidia’s H800 GPUs, a compliant variant designed for the Chinese market with throttled performance to adhere to US rules. The company purchased around 10,000 H800s, forming a massive 50,000-GPU-equivalent cluster when combined with existing resources. This setup enabled training at an unprecedented scale: over 14.8 trillion tokens on 8.1 million GPU hours, dwarfing prior efforts. The resulting model reportedly hits new highs on leaderboards like MMLU (91.6 percent) and Arena-Hard (85.5 percent), while maintaining DeepSeek’s hallmark efficiency with a mixture-of-experts architecture expanding to 671 billion parameters, only 37 billion active per token.

This episode underscores the fragility of China’s AI hardware ecosystem. Despite massive state investments, Huawei’s output lags far behind Nvidia’s mature supply chain. SMIC, China’s leading foundry, fabricates the Ascend 910B on a 7-nanometer process, inferior to the 4-nanometer nodes powering Nvidia’s latest chips. Yield rates and interconnect bottlenecks further hampered scalability, as noted by insiders. DeepSeek’s fallback not only validates Nvidia’s enduring dominance but also highlights the risks of overcommitting to unproven alternatives.

Industry observers point to similar setbacks elsewhere. Alibaba’s Qwen models and Baidu’s Ernie series have quietly incorporated Nvidia hardware despite public rhetoric. SemiAnalysis estimates that China imported over 100,000 H800s and A800s last year alone, fueling a parallel AI boom. DeepSeek’s transparency in disclosing the Nvidia reliance, albeit indirectly through supply chain leaks, contrasts with more opaque peers.

For DeepSeek, the silver lining is the model’s performance. Trained via a data-centric approach with heavy emphasis on synthetic data and post-training optimizations, it excels in coding (HumanEval: 89.9 percent) and multilingual tasks. The open-source release under permissive licensing continues DeepSeek’s democratizing ethos, inviting global fine-tuning and deployment.

This development raises questions about the trajectory of sovereign AI. While Huawei ramps up production and iterates to the superior Ascend 910C, gaps persist. DeepSeek’s CEO has hinted at hybrid future clusters, blending Huawei and Nvidia for resilience. For now, the Nvidia detour propelled a breakthrough, but it serves as a cautionary tale: innovation thrives on reliable infrastructure, not nationalistic fervor alone.

The AI arms race presses on, with compute as the ultimate currency. DeepSeek’s adaptability ensures it remains a contender, even if victory laps include a detour through Nvidia’s domain.

Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.