OpenAI signs $10 billion deal with Cerebras Systems

OpenAI Secures Landmark $10 Billion Partnership with Cerebras Systems for Advanced AI Inference

In a significant move to bolster its computational infrastructure, OpenAI has entered into a multi-year agreement with Cerebras Systems valued at up to $10 billion. This partnership positions Cerebras as a key supplier of high-performance AI inference chips, enabling OpenAI to scale its deployment of sophisticated large language models like GPT-4o and future iterations. The deal underscores the intensifying competition in the AI hardware sector, where demand for efficient inference capabilities is surging to meet real-world applications.

Cerebras Systems, a pioneering AI chipmaker founded in 2015, will provide OpenAI with its flagship CS-3 AI inference systems. These systems are engineered specifically for inference workloads, the phase where trained AI models generate outputs in response to user queries. Unlike training, which requires immense computational power for model development, inference demands low-latency, high-throughput processing to handle millions of simultaneous requests. The CS-3 systems address this by leveraging Cerebras’ proprietary Wafer-Scale Engine (WSE), recognized as the world’s largest chip.

The WSE-3, powering the CS-3, integrates 4 trillion transistors across a single silicon wafer measuring 46,225 square millimeters. This monolithic design contrasts sharply with traditional GPU architectures, which rely on smaller dies tiled together. The chip features 900,000 AI-optimized cores, delivering 125 petaflops of AI compute at FP16 precision and up to 6 exaflops at INT8. Such specifications enable unprecedented memory bandwidth of 21 petabytes per second and on-chip memory capacity of 44 gigabytes, minimizing data movement bottlenecks that plague conventional systems.

Under the agreement, Cerebras will deploy and operate clusters of these CS-3 systems directly within OpenAI’s data centers. This managed service model allows OpenAI to focus on model development and application deployment while Cerebras handles hardware provisioning, software optimization, and maintenance. The partnership is structured as capacity reservations, with OpenAI committing to substantial usage over multiple years, potentially scaling to thousands of CS-3 units. Initial deployments are slated to support inference for OpenAI’s most advanced models, reducing latency and costs compared to reliance on graphics processing units (GPUs).

This collaboration arrives at a pivotal moment for OpenAI. The company has faced challenges with GPU supply shortages, primarily from Nvidia, whose H100 and upcoming Blackwell GPUs dominate the market but come with escalating prices and long lead times. By partnering with Cerebras, OpenAI diversifies its supply chain, mitigating risks associated with single-vendor dependency. Cerebras’ wafer-scale approach offers advantages in inference efficiency, particularly for models exceeding hundreds of billions of parameters, where memory access and interconnect latency are critical hurdles.

Cerebras CEO Andrew Feldman highlighted the strategic fit: “OpenAI is pushing the boundaries of AI, and inference at scale is the next frontier. Our CS-3 systems are purpose-built for this, delivering GPU cluster performance in a single chip with software that makes it seamless.” OpenAI’s leadership echoed this sentiment, noting the deal’s role in accelerating service delivery to millions of users worldwide.

Technically, the CS-3’s architecture revolves around a 2D mesh of cores interconnected via a dedicated Swarm-X communication fabric. This fabric supports all-to-all communication at 22 petabits per second, enabling efficient scaling across multiple chips in a cluster. Cerebras’ software stack, including the Cerebras Inference Engine, optimizes model partitioning, weight streaming, and token generation. For instance, benchmarks demonstrate the CS-3 generating over 2,500 tokens per second for Llama 3.1 405B, outperforming Nvidia GPU clusters in throughput per watt.

Cerebras has a track record of deploying such systems at scale. Its CS-2 predecessor powered the Condor Galaxy 3 supercomputer, ranking among the world’s fastest for AI workloads. The company has secured prior contracts with entities like the U.S. Department of Energy and pharmaceutical firms for drug discovery simulations. This OpenAI deal represents Cerebras’ largest commercial commitment to date, validating its wafer-scale paradigm amid a market projected to see inference demand grow exponentially.

The partnership also reflects broader industry shifts. As AI models mature, inference now accounts for the majority of operational costs, prompting innovations in specialized hardware. Cerebras’ approach challenges the multi-chip module trend epitomized by Nvidia’s Grace Hopper superchips, arguing that wafer-scale integration eliminates packaging overhead and inter-die bandwidth limitations. While Nvidia remains the incumbent with mature ecosystems, Cerebras’ focus on inference positions it as a complementary player, potentially capturing a slice of the $100 billion-plus AI accelerator market.

OpenAI’s move signals confidence in alternative architectures, following investments in in-house chip design via its “Project Stargate” data center initiative. However, the Cerebras deal provides immediate relief, with systems expected online within quarters. Analysts view this as a win-win: OpenAI gains cost-effective scale, while Cerebras achieves revenue visibility to fund WSE-4 development, promising even greater density.

In summary, this $10 billion pact marks a milestone in AI infrastructure evolution, blending Cerebras’ hardware innovation with OpenAI’s model leadership to drive the next era of intelligent applications.

Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.