- Nvidia's $20 billion Groq deal signals an important shift in the AI market.
- AI workloads are moving from model training to real-time inference as the main focus.
- Specialized inference chips like Groq's LPUs offer speed and efficiency over traditional GPUs.
For years, Nvidia's rise has been synonymous with one idea: GPUs are the engine of artificial intelligence. They powered the training boom that turned large language models from academic curiosities into trillion-dollar ambitions. But Nvidia's $20 billion deal with Groq is an admission that the next phase of AI won't be won by GPUs alone.
Groq makes a very different type of AI chip called a Language Processing Unit, or LPU. To understand why Nvidia spent so much, and why it didn't simply build this technology itself, you have to look at where AI workloads are heading. The industry is moving from training models to running them in the real world. That shift has a name: inference.
Inference is what happens after a model is trained, when it answers questions, generates images, or carries on conversations with users. It's becoming the dominant task in AI computing, and could dwarf the training market in the future, according to estimates recently compiled by RBC Capital analysts.
This matters because inference has very different needs than training. Training is like building a brain: it requires massive amounts of raw computing power and flexibility. Inference is more like using that brain in real time. Speed, consistency, power efficiency, and cost per answer suddenly matter far more than brute force.
That's where Groq comes in. Founded by former Google engineers, Groq built its business around inference-only chips. Its LPUs are designed less like general-purpose factories and more like precision assembly lines. Every operation is planned in advance, executed in a fixed order, and repeated perfectly each time. That rigidity is a weakness for training, but a strength for inference, where predictability translates into lower latency and less wasted energy.
By contrast, Nvidia's Graphics Processing Units, or GPUs, are designed to be flexible. They rely on schedulers and large pools of external memory to juggle many kinds of workloads. That flexibility is why GPUs won the training market, but it also creates overhead that slows inference down. As AI products mature and stabilize, that tradeoff becomes harder to justify.
"The tectonic plates of the semiconductor industry just shifted again," Tony Fadell, creator of the iPod and an investor in Groq, wrote on LinkedIn recently. "GPUs decisively won the first wave of AI data centers: training. But inference was always going to be the real volume game, and GPUs by design aren't optimized for it."
Fadell calls this new breed of AI chips "IPUs," or Inference Processing Units.
An explosion of different chips
TD Cowen analysts noted this week that Nvidia's embrace of not just an inference-specific chip, but a whole new architecture, shows how large and mature the inference market has become.
Earlier AI infrastructure investments were made based on training-first buying decisions. The adage used to be "today's training chips are tomorrow's inference engines," which favored Nvidia's GPUs, but that's no longer the case, the analysts added.
Instead, there will be an explosion of different chips inside future AI data centers, according to Chris Lattner, an industry visionary who helped develop the software for Google's TPU AI chips, which Groq founder Jonathan Ross co-designed.
This move beyond GPUs is being driven by two trends that have been reinforced by Nvidia's Groq deal, Lattner told me this week.
"The first is that 'AI' is not a single workload — there are lots of different workloads for inference and training," he said. "The second is that hardware specialization leads to huge efficiency gains."
"Humble move"
In a 2024 story (that aged very well), Business Insider warned readers that inference could be a vulnerability for Nvidia as rivals looked to fill this strategic gap. Cerebras built massive AI chips optimized for speed, claiming memory bandwidth thousands of times higher than Nvidia's flagship GPU offering at the time. Google's TPUs are designed to efficiently run bespoke AI workloads at blazing speeds. Amazon developed its own inference chip, Inferentia. Startups like Positron AI argued they could beat or match Nvidia's inference performance at a fraction of the cost.
So Nvidia's deal with Groq can be seen as a preemptive move. Rather than letting inference specialists chip away at its dominance, Nvidia chose to embrace a fundamentally different architecture.
Fadell described the deal as a "humble move" by Nvidia CEO Jensen Huang. "Many companies miss inflection points like this due to 'Not Invented Here- driven egos," Fadell added. "Jensen doesn't; he saw the threat and made it work to his advantage."
The economics of inference
The economics are compelling. Inference is where AI products make money. It's the phase that proves whether hundreds of billions of dollars spent on data centers will ever pay off. As AWS CEO Matt Garman put it in 2024, if inference doesn't dominate, "all this investment in these big models isn't really going to pay off."
Importantly, Nvidia isn't betting on a single winner. GPUs will still handle training and flexible workloads. Specialized chips like Groq's will handle fast, real-time inference. Nvidia's advantage lies in owning the connective tissue — the software, networking, and developer ecosystem that lets these components work together.
"AI datacenters are becoming hybrid environments where GPUs and custom ASICs operate side-by-side, each optimized for different workload types," RBC analysts wrote in a recent note, referring to Application-Specific Integrated Circuits such as Groq's LPUs.
Some competitors argue the deal proves GPUs are ill-suited for high-speed inference. Others see it as validation of a more fragmented future, where different chips serve different needs. Nvidia's Huang appears firmly in the second camp. By licensing Groq's technology and bringing its team inside the tent, Nvidia ensures it can offer customers both the shovels and the assembly lines of AI.
Indeed, Nvidia has developed an NVLink Fusion technology that lets other custom chips connect directly to its GPUs, reinforcing this mixed-hardware future, the RBC Capital analysts noted.
"GPUs are phenomenal accelerators," Andrew Feldman, CEO of Cerebras, wrote recently. "They've gotten us far in AI. They're just not the right machine for high-speed inference. And there are other architectures that are. And Nvidia has just spent $20B to corroborate this."
Sign up for BI's Tech Memo newsletter here. Reach out to me via email at [email protected].

















