Yet another tech startup wants to topple Nvidia with 'orders of magnitude' better energy efficiency; Sagence AI bets on analog computing in memory to deliver 666,000 tokens/s in Llama2-70B

Sagence brings in-memory analog computing to redefine AI inference
Ten times less power and 20 times less costs
It also offers integration with PyTorch and TensorFlow.

Sagence AI has introduced an advanced in-memory analog computing architecture designed to address power, cost and scalability issues in AI inference.

Using an analog approach, the architecture offers improvements in energy efficiency and cost-effectiveness, while delivering performance comparable to existing high-end CPU and GPU systems.

This bold move positions Sagence AI as a potential disruptor in a market dominated by Nvidia.

Efficiency and performance

The Sagence architecture offers benefits when processing large language models such as Llama2-70B. When normalized to 666,000 tokens per second, Sagence’s technology delivers results with 10x lower power consumption, 20x lower costs, and 20x lower rack space compared to leading GPU-based solutions.

This design prioritizes inference demands over training, reflecting the shift in focus of AI computing within data centers. With its efficiency and affordability, Sagence offers a solution to the growing challenge of ensuring return on investment (ROI) as AI applications expand toward large-scale deployment.

At the heart of Sagence’s innovation is its analog in-memory computing technology, which merges storage and computing within memory cells. By eliminating the need for separate storage and multiple accumulation programmed circuits, this approach simplifies chip designs, reduces costs, and improves power efficiency.

Sagence also employs subthreshold deep computing in multi-level memory cells, an industry-first innovation, to achieve the efficiency gains needed for scalable AI inference.

Traditional CPU- and GPU-based systems rely on complex dynamic programming, which increases hardware demands, inefficiencies, and power consumption. Sagence’s statically programmed architecture simplifies these processes, mirroring biological neural networks.

The system is also designed to integrate with existing AI development frameworks such as PyTorch, ONNX, and TensorFlow. Once trained neural networks are imported, Sagence’s architecture eliminates the need for additional GPU-based processing, simplifying deployment and reducing costs.

“A fundamental advance in AI inference hardware is vital to the future of AI. The use of large language models (LLM) and generative AI drives demand for rapid, massive changes at the core of computing, requiring an unprecedented combination of higher performance with lower power and cost-combining economics. with the value created,” said Vishal Sarin, CEO AND Founder of Sagence AI.

“Today’s legacy computing devices that are capable of high-performance AI inference cost too much to be economically viable and consume too much energy to be environmentally sustainable. Our mission is to break those economic and performance limitations in an environmentally responsible way,” added Sarin.

Through IEEE Spectrum

Yet another tech startup wants to topple Nvidia with ‘orders of magnitude’ better energy efficiency; Sagence AI bets on analog computing in memory to deliver 666,000 tokens/s in Llama2-70B

Leave a Comment Cancel Reply

Must Read

Leave a Comment Cancel Reply