This massive technology alliance takes on InfiniBand with a dream of open Ethernet for next-generation AI superclusters.

InfiniBand’s long dominance faces real pressure from the open Ethernet standard movement
Meta and Nvidia bet on openness to scale AI networks
ESUN project links industry rivals through shared networking ambitions

The Open Compute Project (OCP) has announced a new initiative known as Ethernet for Scale-Up Networking (ESUN), which aims to develop open standards for high-performance connections within artificial intelligence clusters.

This collaboration brings together companies like Meta, Nvidia, AMD, Cisco, and OpenAI to explore how Ethernet can rival existing interconnects like InfiniBand in large-scale data centers.

Other companies joining the collaboration include Arista, ARM, Broadcom, HPE Networking, Marvell, Microsoft and Oracle.

Open networks for AI clusters

InfiniBand has long dominated the high-speed AI networking market, accounting for approximately 80% of the infrastructure connecting GPUs and accelerators.

However, the ESUN group believes that the maturity, cost-effectiveness and interoperability of Ethernet make it a strong candidate for scaling AI clusters.

Unlike proprietary systems, Ethernet’s broad familiarity among engineers could help reduce complexity in managing massive AI workloads.

Supporters argue that using Ethernet as an open standard will allow operators to scale infrastructure while reducing costs.

OCP’s new AI tools initiative builds on previous work from its SUE-Transport (SUE-T) program, which explored Ethernet transport for multiprocessor systems.

ESUN participants will meet periodically to define standards for switch behavior, including protocol headers, error handling, and lossless data transfer.

The group will also study how network design affects load balancing and memory ordering within GPU-based systems.

It plans to coordinate with the Ultra Ethernet Consortium and the IEEE 802.3 standards body to ensure alignment across the broader Ethernet ecosystem.

Several companies have already developed Ethernet-based products aimed at scaling AI: Broadcom’s Tomahawk Ultra switch, for example, supports up to 77 billion packets per second, and Nvidia’s Spectrum-X platform also combines Ethernet with acceleration hardware for AI clusters.

However, Meta, who co-founded OCP in 2011, sees ESUN as a natural extension of his push for open hardware within data centers.

Still, observers note that replacing established InfiniBand networks would require Ethernet to prove itself in the most demanding AI workloads, where latency and reliability are critical.

ESUN’s success will depend on balancing openness with performance. Proponents see a future in which AI systems will run on interoperable hardware using standardized Ethernet technologies.

However, given the scale and sensitivity of AI infrastructure, it remains uncertain whether industry momentum will shift decisively away from proprietary interconnections.

For now, ESUN represents an ambitious effort and it remains to be seen if it can match the performance of InfiniBand.

Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds. Be sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp also.

Must Read

Leave a Comment Cancel Reply