China built a giant AI supercomputer without Nvidia GPU and instead used millions of Huawei ARM cores

Huawei-Linked LineShine Supercomputer Packs 2.45 Million Arm Cores Into Huge AI Cluster
Huawei processors power one of the largest AI computing facilities in China today
CPU-only supercomputers eliminate costly data transfers between processors and accelerators during workloads

China has deployed a massive CPU-only supercomputer called LineShine that delivers 1.54 exaflops of AI training performance without using any GPU.

The system includes 20,480 compute nodes, each containing two LX2 processors for a total of 40,960 chips across the machine.

Each LX2 processor has 304 CPU cores, meaning the entire supercomputer uses approximately 2.45 million Armv9 cores in total.

Inside the unusual architecture of the LX2 processor

The processor was developed by Huawei or through a joint design with China’s National Supercomputing Center, although the exact origin remains undisclosed.

Each LX2 processor uses two computing chiplets with cores organized into eight groups containing 38 cores per group.

Each core includes ARM’s Scalable Vector Extension and Scalable Matrix Extension units that accelerate matrix operations used in AI training.

The processor delivers 60.3 teraflops of FP64 performance, 240 teraflops of BF16 performance, and 960 teraops of INT8 performance from a single chip.

The memory subsystem combines 32 GB of included HBM offering up to 4 TB/s of bandwidth with up to 256 GB of external DDR5 memory.

CPU-only systems offer several advantages for complex scientific tasks that combine AI training with massive data ingestion and preprocessing.

Since everything runs on the same processor and memory space, they avoid costly and bandwidth-intensive CPU-to-GPU data transfers.

Homogeneous CPU-based systems can also expose much larger coherent memory pools by combining HBM with large DDR capabilities.

This is useful for handling massive scientific data sets, augmented generation recovery, and long context windows that GPU memory limitations cannot easily accommodate.

The big caveat with this approach

CPU-only systems are typically less energy efficient and offer lower-density AI performance than GPU-based supercomputers.

This is the main reason why most of the industry is betting on heterogeneous CPU and GPU architectures for large-scale AI workloads.

China is going down this path largely because of US bans on GPU exports, not because CPU-only systems are technically superior for AI tasks.

LineShine shows that CPUs can successfully perform GPU work, but the efficiency gap between the two approaches remains substantial and is unlikely to close anytime soon.

China is making a strategic trade-off, accepting lower performance and higher power consumption in exchange for independence from foreign hardware and software ecosystems like Nvidia’s GPUs and CUDA.

Whether that trade-off makes sense for long-term AI development depends entirely on how quickly Chinese manufacturers can close the performance gap with their own GPU designs.

Until then, LineShine will remain a notable engineering achievement and practical necessity, but it probably won’t be a model for how most of the world will build AI supercomputers.

Via Toms Hardware

Google logo on black background next to text that says

Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds.

Must Read

Leave a Comment Cancel Reply