Microsoft links two massive AI data centers to build an ultra-dense superfactory that promises unprecedented computing on a national infrastructure network.

The system links distant facilities to run massive training workloads continuously.
High-speed fiber keeps GPUs active, avoiding slow data bottlenecks
Double-deck chip density increases computing power and reduces behind-the-scenes latency

Microsoft has unveiled its first AI Superfactory, connecting large AI data centers in Wisconsin and Atlanta via a dedicated fiber network designed for high-speed movement of training data.

The design places chips close together on two floors to increase density and reduce lag.

It also uses extensive wiring and liquid systems arranged to manage the weight and heat produced by large groups of hardware.

A network created for training large-scale models

In a blog post, Microsoft said this setup will support large AI workloads that differ from the smaller, more isolated tasks common in cloud environments.

“It’s about building a distributed network that can act as a virtual supercomputer to address the world’s biggest challenges,” said Alistair Speirs, Microsoft’s general manager focused on Azure infrastructure.

“The reason we call this an AI superfactory is that it runs complex work across millions of pieces of hardware… it’s not just one site training an AI model, it’s a network of sites supporting that work.”

The AI WAN system moves information over thousands of miles using dedicated fiber, some newly constructed and some repurposed from previous acquisitions.

Network protocols and architecture have been adjusted to shorten paths and keep data moving with minimal delay.

Microsoft says this allows distant sites to cooperate in the same model training process in near real time, with each location contributing its share of the calculation.

The focus is on maintaining continuous activity on a large number of GPUs so that no unit stops while waiting for results from another location.

“Leading in AI isn’t just about adding more GPUs, it’s about building the infrastructure that makes them work together as a single system,” said Scott Guthrie, executive vice president of Cloud + AI at Microsoft.

Microsoft uses the Fairwater design to support high-performance rack systems, including Nvidia GB200 NVL72 units designed to scale to very large groups of Blackwell GPUs.

The company combines this hardware with liquid cooling systems that send heated fluid out of the building and return it at lower temperatures.

Microsoft claims that operational cooling uses almost no new water, other than periodic replacement when necessary for chemical control.

The Atlanta site reflects Wisconsin’s design, providing consistent architecture across multiple regions as more facilities come online.

“To make improvements in AI capabilities, you need to have a growing infrastructure to train it,” said Mark Russinovich, CTO, deputy CISO and technical fellow at Microsoft Azure.

“The amount of infrastructure needed now to train these models is not just one data center, not two, but multiple.”

The company positions these sites as specifically designed to train advanced AI tools, citing increasing numbers of parameters and larger training data sets as key pressures driving expansion.

The facilities incorporate exabytes of storage and millions of CPU cores to support tasks related to core training workflows.

Microsoft suggests this scale is necessary for partners like OpenAI and its own AI Superintelligence team to continue developing the model.

Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds. Be sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp also.

Must Read

Leave a Comment Cancel Reply