- Sambanova runs Deepseek-R1 to 198 tokens/sec using 16 custom chips
- According to reports, the SN40L RDU chip is 3 times faster, 5 times more efficient than GPUs
- Soon 5x speed impulse is promised, with 100x capacity at the end of the year in the cloud
The Chinese Deepseek monkey , while more profitable.
Sambanova Systems, a startup founded in 2017 by experts from Sun/Oracle and Stanford University, has now announced what states that it is the fastest deployment in the world of the Deepseek-R1 671b LLM to date.
The company says that it has achieved 198 tokens per second, by user, using only 16 custom chips, replacing the 40 racks of 320 GPU Nvidia that would generally be required.
Independently verified
“Fed by the RDU SN40L chip, Sambanova is the fastest platform that Deepseek executes,” said Rodrigo Liang, CEO and co -founder of Sambanova. “This will increase to 5 times faster than the latest speed of the GPU on a single shelf, and by the end of the year, we will offer a capacity 100 times for Deepseek-R1.”
While the NVIDIA GPUs have traditionally fed the great workloads of AI, Sambanova argues that its reconfigurable data flow architecture offers a more efficient solution. The company states that its hardware offers the speed three times and five times the efficiency of the leaders GPU while maintaining the full reasoning power of Deepseek-R1.
“Deepseek-R1 is one of the most advanced border models available, but its complete potential has been limited by the inefficiency of GPUs,” Liang said. “That changes today. We are bringing the next great advance, collapse inference costs and the reduction of 40 racks hardware requirements only one, to offer Deepseek -R1 at faster speeds, efficiently. “
George Cameron, co-founder of AI who evaluates the artificial analysis of the firm, said that his company had “compared independently the deployment in the Sambanova cloud of the complete mixture of 671 billion Deepseek-R1 parameters of experts in more than 195 tokens /s of production, the fastest exit speed we have. Departure are particularly important for reasoning models, since these models use reasoning exit tokens to improve the quality of their answers. “
Deepseek-R1 671B is now available in Sambanova Cloud, with access to the API offered to select users. The company is climbing the capacity quickly, and says it hopes to reach 20,000 tokens per second of the total frame “in the near future”.