- GPUs handle prefetch operations by converting messages into key-value caches.
- SambaNova RDUs generate tokens with high throughput and low latency
- Intel Xeon 6 processors manage workload distribution and execute compiled code
Intel and SambaNova Systems have introduced a joint hardware model that combines GPUs, SambaNova RDUs, and Intel Xeon 6 processors for large-scale inference workloads.
The system allocates GPUs for prefetch operations, RDUs for decoding, and Xeon CPUs for execution and orchestration tasks in agent-controlled environments.
“Agent AI is entering production, and the winning pattern we’re seeing is GPUs to start the job, Intel Xeon 6 to run it, and SambaNova RDUs to finish it quickly,” said Rodrigo Liang, CEO and co-founder of SambaNova Systems.
Article continues below.
The CPU is the execution and control layer.
This design is planned to be available in the second half of 2026 for enterprises, cloud providers, and sovereign deployments.
The architecture places Intel Xeon 6 processors at the center of system control, where they manage workload distribution, execute code, and coordinate tool interactions.
It includes handling the build, validating results, and maintaining communication between concurrent processes.
“When thousands of simultaneous encryption agents generate tool calls, fetch requests, code builds, and encrypted messages between agents, the CPU is not a background component, it is the executive and action layer of the system,” said Harry Ault, CRO at SambaNova.
The statement defines the CPU as the primary layer responsible for system behavior rather than a supporting component.
According to SambaNova, Xeon 6 offers more than 50% faster LLVM compile times compared to Arm-based server CPUs.
It also delivers up to 70% faster vector database performance compared to other x86-based systems.
These figures relate to the execution speed within the encoding and retrieval workflows, and in this configuration, the GPUs process the prefill stage by converting the cues into key-value caches.
SambaNova RDUs function as a decoding layer, generating tokens with high throughput and low latency.
Xeon 6 processors serve as host CPUs and execution engines, managing system-level operations and executing compiled workloads.
“Production inference is moving toward heterogeneous hardware: no one type of chip is optimal for each stage of an agent workflow,” said Banghua Zhu, co-founder and CTO of RadixArk.
He added that combining RDU with Xeon CPUs allows systems to maintain compatibility with existing software environments.
The system is designed to operate within existing air-cooled data centers without the need for new construction.
According to the companies, this allows inference workloads to be expanded without putting additional pressure on water and energy resources.
As Nvidia and Groq continue to focus on improving inference performance and latency, this announcement adds a layer of competition.
It offers an alternative approach that distributes workloads across multiple hardware layers rather than relying on a single processing model.
Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds. Be sure to click the Follow button!
And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp also.




