Meta found a way to turn retired server RAM into a cheap hyperscale memory expansion without buying new DRAM

Meta recycled recalled DDR4 memory instead of buying expensive new DRAM
CXL technology turned waste server memory into useful computing power
Meta reported 25% fewer servers for machine learning inference workloads

Memory shortages, rising DRAM prices, and extended delivery schedules have pushed hyperscalers toward alternatives that seemed impractical not long ago, but Meta has developed a way to reuse old DDR4 memory pulled from decommissioned servers instead of discarding it.

The approach allows companies to expand server memory capacity without purchasing new DRAM, a cost the researchers describe as the so-called RAM tax.

This expansion is made possible by Compute Express Link (CXL) technology, which connects older DDR4 modules together with newer DDR5 memory clusters on the same machine.

Reuse Old Souvenirs Instead of Buying New Souvenirs

Meta describes the approach as a near-zero-cost memory expansion while significantly reducing e-waste and infrastructure emissions.

The strategy comes at a time when memory supply limitations continue to impact server deployment schedules in cloud computing environments around the world.

According to Meta researchers, existing CXL implementations ran into problems because expanded memory offered nearly ten times less bandwidth than local memory.

The company also reported approximately 60% higher latency levels compared to directly attached memory that resides next to the processor sockets inside the servers.

Another limitation involved commercial CXL products that included controllers with DRAM modules, which prevented practical reuse of existing DDR4 inventories at scale.

Meta responded by developing an in-house ASIC known as Vistara, specifically designed to achieve low latency, power efficiency, and recycled memory usage.

The accompanying software stack automatically determines appropriate memory ratios for individual workloads while disabling expansion when delays become unacceptable operational compromises.

“We addressed these challenges through hardware and software co-design. On the hardware side, we designed an in-house CXL ASIC, Vistara, optimized for DRAM reuse, power efficiency, and low latency,” Meta said.

“On the software side, we created an optimized solution based on TPP (transparent page placement), determined the appropriate ratio of local to expanded memory for each workload, and automated configuration per workload, including disabling expanded memory for workloads that cannot tolerate increased latency.”

Meta claims that the architecture demonstrates enough practical value to justify deployment in production environments that handle diverse computational requirements on a daily basis.

Meta reported that disaggregated machine learning inference workloads achieved server count reductions reaching up to 25% through deployment.

Distributed cache systems reportedly saw average latency reductions of approximately 29%, despite relying in part on slower recycled memory resources underneath.

The findings suggest that additional capacity sometimes exceeds raw memory speed when applications struggle more with scarcity than response times.

Interestingly, the same interconnect technology that is attracting the attention of Meta is also attracting the interest of semiconductor companies developing large accelerator structures globally.

The broader ecosystem includes the work of companies seeking alternatives to proprietary interconnect technologies, such as NVLink systems, widely adopted by Nvidia.

Among them is Ultra Accelerator Link, or UAL, an independent initiative backed by AMD, AWS, Google, Microsoft, and Meta to connect accelerators from different hardware vendors.

Within Meta’s own testing, disaggregated machine learning inference systems and distributed caching infrastructure were the two workloads directly examined by the researchers.

Both saw measurable improvements from the recycled memory approach, with inference systems requiring fewer servers and caches experiencing lower average latency.

Whether DDR4 recycled via CXL becomes standard practice will likely depend on performance tradeoffs remaining acceptable outside of hyperscale environments.

Through blocks and files

Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds. Be sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp also.

Must Read

Leave a Comment Cancel Reply