This tiny chip could dethrone Nvidia GPUs by merging compute and memory for mind-blowing AI speed and efficiency

The GSI Gemini-I APU reduces the constant shuffling of data between the processor and memory systems.
Complete recovery tasks up to 80% faster than comparable CPUs
GSI Gemini-II APU will offer ten times greater performance

GSI Technology is pioneering a new approach to AI processing that puts computing directly in memory.

A new study from Cornell University draws attention to this design, known as the associative processing unit (APU).

It aims to push long-standing limits on performance and efficiency, suggesting it could challenge the dominance of the best GPUs currently used in AI tools and data centers.

A new competitor in AI hardware

Published in ACM Magazine and presented at the recent Micro ’25 conference, the Cornell research evaluated GSI’s Gemini-I APU against leading CPUs and GPUs, including Nvidia’s A6000, using Recovery Augmented Generation (RAG) workloads.

The tests covered data sets of 10 to 200 GB, representing realistic AI inference conditions.

By performing calculations within static RAM, the APU reduces the constant shuffling of data between the processor and memory.

This is a key source of power loss and latency in conventional GPU architectures.

The results showed that the APU could achieve GPU-class performance while consuming much less power.

GSI reported that its APU used up to 98% less power than a standard GPU and completed recovery tasks up to 80% faster than comparable CPUs.

Such efficiency could make it attractive for cutting-edge devices such as drones, IoT systems and robotics, as well as for aerospace and defense use, where power and cooling limits are strict.

Despite these findings, it is still unclear whether in-memory computing technology can scale to the same level of maturity and support enjoyed by the best GPU platforms.

Currently, GPUs benefit from well-developed software ecosystems that enable seamless integration with leading AI tools.

For in-memory computing devices, optimization and scheduling remain emerging areas that could hold back broader adoption, especially in large data center operations.

GSI Technology says it continues to refine its hardware and the Gemini-II generation is expected to offer ten times the performance and lower latency.

Another design, called Plato, is being developed to further extend the computing performance of embedded edge systems.

“Cornell’s independent validation confirms what we have long believed: in-memory computing has the potential to disrupt the $100 billion AI inference market,” said Lee-Lean Shu, president and CEO of GSI Technology.

“The APU delivers GPU-class performance at a fraction of the power cost, thanks to its highly efficient memory-centric architecture. Our recently released second-generation APU silicon, Gemini-II, can deliver approximately 10x faster performance and even lower latency for memory-intensive AI workloads.”

Through TechPowerUp

Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds. Be sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp also.

Must Read

Leave a Comment Cancel Reply