Cerebras just announced 6 new AI datacenters that process 40M tokens per second — and it could be bad news for Nvidia

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Cerebras Systems, an AI hardware startup that has been steadily challenging Nvidia’s dominance in the artificial intelligence market, announced Tuesday a significant expansion of its data center footprint and two major enterprise partnerships that position the company to become the leading provider of high-speed AI inference services.
The company will add six new AI data centers across North America and Europe, increasing its inference capacity twentyfold to over 40 million tokens per second. The expansion includes facilities in Dallas, Minneapolis, Oklahoma City, Montreal, New York, and France, with 85% of the total capacity located in the United States.
“This year, our goal is to truly satisfy all the demand and all the new demand we expect will come online as a result of new models like Llama 4 and new DeepSeek models,” said James Wang, Director of Product Marketing at Cerebras, in an interview with VentureBeat. “This is our huge growth initiative this year to satisfy almost unlimited demand we’re seeing across the board for inference tokens.”
The data center expansion represents the company’s ambitious bet that the market for high-speed AI inference — the process where trained AI models generate outputs for real-world applications — will grow dramatically as companies seek faster alternatives to GPU-based solutions from Nvidia.

Strategic partnerships that bring high-speed AI to developers and financial analysts
Alongside the infrastructure expansion, Cerebras announced partnerships with Hugging Face, the popular AI developer platform, and AlphaSense, a market intelligence platform widely used in the financial services industry.
The Hugging Face integration will allow its five million developers to access Cerebras Inference with a single click, without having to sign up for Cerebras separately. This represents a major distribution channel for Cerebras, particularly for developers working with open-source models like Llama 3.3 70B.
“Hugging Face is kind of the GitHub of AI and the center of all open source AI development,” Wang explained. “The integration is super nice and native. You just appear in their inference providers list. You just check the box and then you can use Cerebras right away.”
The AlphaSense partnership represents a significant enterprise customer win, with the financial intelligence platform switching from what Wang described as a “global, top three closed-source AI model vendor” to Cerebras. The company, which serves approximately 85% of Fortune 100 companies, is using Cerebras to accelerate its AI-powered search capabilities for market intelligence.
“This is a tremendous customer win and a very large contract for us,” Wang said. “We speed them up by 10x so what used to take five seconds or longer, basically become instant on Cerebras.”

How Cerebras is winning the race for AI inference speed as reasoning models slow down
Cerebras has been positioning itself as a specialist in high-speed inference, claiming its Wafer-Scale Engine (WSE-3) processor can run AI models 10 to 70 times faster than GPU-based solutions. This speed advantage has become increasingly valuable as AI models evolve toward more complex reasoning capabilities.
“If you listen to Jensen’s remarks, reasoning is the next big thing, even according to Nvidia,” Wang said, referring to Nvidia CEO Jensen Huang. “But what he’s not telling you is that reasoning makes the whole thing run 10 times slower because the model has to think and generate a bunch of internal monologue before it gives you the final answer.”
This slowdown creates an opportunity for Cerebras, whose specialized hardware is designed to accelerate these more complex AI workloads. The company has already secured high-profile customers including Perplexity AI and Mistral AI, who use Cerebras to power their AI search and assistant products, respectively.
“We help Perplexity become the world’s fastest AI search engine. This just isn’t possible otherwise,” Wang said. “We help Mistral achieve the same feat. Now they have a reason for people to subscribe to Le Chat Pro, whereas before, your model is probably not the same cutting-edge level as GPT-4.”

The compelling economics behind Cerebras’ challenge to OpenAI and Nvidia
Cerebras is betting that the combination of speed and cost will make its inference services attractive even to companies already using leading models like GPT-4.
Wang pointed out that Meta’s Llama 3.3 70B, an open-source model that Cerebras has optimized for its hardware, now scores the same on intelligence tests as OpenAI’s GPT-4, while costing significantly less to run.
“Anyone who is using GPT-4 today can just move to Llama 3.3 70B as a drop-in replacement,” he explained. “The price for GPT-4 is [about] $4.40 in blended terms. And Llama 3.3 is like 60 cents. We’re about 60 cents, right? So you reduce cost by almost an order of magnitude. And if you use Cerebras, you increase speed by another order of magnitude.”
Inside Cerebras’ tornado-proof data centers built for AI resilience
The company is making substantial investments in resilient infrastructure as part of its expansion. Its Oklahoma City facility, scheduled to come online in June 2025, is designed to withstand extreme weather events.
“Oklahoma, as you know, is a kind of a tornado zone. So this data center actually is rated and designed to be fully resistant to tornadoes and seismic activity,” Wang said. “It will withstand the strongest tornado ever recorded on record. If that thing just goes through, this thing will just keep sending Llama tokens to developers.”
The Oklahoma City facility, operated in partnership with Scale Datacenter, will house over 300 Cerebras CS-3 systems and features triple redundant power stations and custom water-cooling solutions specifically designed for Cerebras’ wafer-scale systems.

From skepticism to market leadership: How Cerebras is proving its value
The expansion and partnerships announced today represent a significant milestone for Cerebras, which has been working to prove itself in an AI hardware market dominated by Nvidia.
“I think what was reasonable skepticism about customer uptake, maybe when we first launched, I think that is now fully put to bed, just given the diversity of logos we have,” Wang said.
The company is targeting three specific areas where fast inference provides the most value: real-time voice and video processing, reasoning models, and coding applications.
“Coding is one of these kind of in-between reasoning and regular Q&A that takes maybe 30 seconds to a minute to generate all the code,” Wang explained. “Speed directly is proportional to developer productivity. So having speed there matters.”
By focusing on high-speed inference rather than competing across all AI workloads, Cerebras has found a niche where it can claim leadership over even the largest cloud providers.
“Nobody generally competes against AWS and Azure on their scale. We don’t obviously reach full scale like them, but to be able to replicate a key segment… on the high-speed inference front, we will have more capacity than them,” Wang said.
Why Cerebras’ US-centric expansion matters for AI sovereignty and future workloads
The expansion comes at a time when the AI industry is increasingly focused on inference capabilities, as companies move from experimenting with generative AI to deploying it in production applications where speed and cost-efficiency are critical.
With 85% of its inference capacity located in the United States, Cerebras is also positioning itself as a key player in advancing domestic AI infrastructure at a time when technological sovereignty has become a national priority.
“Cerebras is turbocharging the future of U.S. AI leadership with unmatched performance, scale and efficiency – these new global datacenters will serve as the backbone for the next wave of AI innovation,” said Dhiraj Mallick, COO of Cerebras Systems, in the company’s announcement.
As reasoning models like DeepSeek R1 and OpenAI’s o3 become more prevalent, the demand for faster inference solutions is likely to grow. These models, which can take minutes to generate answers on traditional hardware, operate near-instantaneously on Cerebras systems, according to the company.
For technical decision makers evaluating AI infrastructure options, Cerebras’ expansion represents a significant new alternative to GPU-based solutions, particularly for applications where response time is critical to user experience.
Whether the company can truly challenge Nvidia’s dominance in the broader AI hardware market remains to be seen, but its focus on high-speed inference and substantial infrastructure investment demonstrates a clear strategy to carve out a valuable segment of the rapidly evolving AI landscape.