- Fractile with headquarters in the United Kingdom is backed by NATO and wants to build a faster and more cheaper memory in memory
- The Nvidia Bruteforce GPU approach consumes too much power and memory retains
- Fractile numbers focused on a GPU H100 comparison group, not the conventional H200
Nvidia is comfortably at the top of the AI hardware food chain, dominating the market with its high -performance GPU and a CUDA software battery, which have quickly become the predetermined tools to train and execute large models of AI, but that domain has a cost, namely a growing goal in the back.
Hyperscalers such as Amazon, Google, Microsoft and Meta are pouring resources to develop their own personalized silicon in an effort to reduce their dependence on Nvidia chips and reduce costs. At the same time, a wave of new AI hardware companies is trying to capitalize on the growing demand for specialized accelerators, hoping to offer more efficient or affordable alternatives and, ultimately, displace Nvidia.
It is possible that he has not yet heard of fractile -based fractile in the United Kingdom, but the startup, which states that its revolutionary computing approach can execute the world’s largest language models 100 times faster and at 1/10 the cost of existing systems, has some fairly notable sponsors, including NATO and the former CEO of Intel, Pat Gelsinger.
Remove each bottleneck
“We are building the hardware that will eliminate each bottleneck with the fastest possible inference of the largest transformers networks,” says Fractile.
“This means that the largest LLMs in the world are executed faster than you can read, and a completely new universe of capabilities and possibilities of how we work that will be unlocked by the almost instantaneous inference of models with superhuman intelligence.”
It is worth noting, before getting too excited, that fractile performance numbers are based on comparisons with GPU NVIDIA H100 GPU groups using 8-bits quantization and tensorrt-llm, executing call 2 70b, not the newest H200 chips.
In a LinkedIn publication, Gelsinger, who recently joined VC Firm Playground Global as a general partner, wrote: “The inference of the border models is a bottleneck by hardware. Even before the test computing scale, the cost and latency were great challenges for the implementations of LLM on a large scale … to achieve our aspirations for the IA Fast, blessed and of lower power inferred with power. ” “” “” “” “
“I am pleased to share that I have recently invested in fractile, an artificial intelligence hardware company based on the United Kingdom that is following a radical enough path to offer such a leap,” he revealed.
“Its computing approach in memory of the inference acceleration jointly addresses the two bottlenecks to the scale inference, overcoming both the memory bottleneck that contains today’s GPUs, while decimating energy consumption, the greatest physical restriction we face in the next decade in the capacity of the data centers.