Next-generation HBF memory will power AI accelerators faster than ever, changing the way GPUs handle massive data sets efficiently

HBF offers ten times the capacity of HBM and is still slower than DRAM
GPUs will access larger data sets through tiered HBM-HBF memory
Writes to HBF are limited and require the software to focus on reads.

The explosion of AI workloads has put unprecedented pressure on memory systems, forcing companies to rethink how they deliver data to accelerators.

High-bandwidth memory (HBM) has served as a fast cache for GPUs, allowing AI tools to read and process key value (KV) data efficiently.

However, HBM is expensive, fast, and has limited capacity, while high-bandwidth flash (HBF) offers much higher volume at slower speeds.

How HBF complements HBM

The HBF design allows GPUs to access a larger data set while limiting the number of writes, approximately 100,000 per module, requiring the software to prioritize reads over writes.

HBF will be integrated together with HBM near the AI accelerators, forming a tiered memory architecture.

Professor Kim Joungho of KAIST compares HBM to a bookshelf at home for quick study, while HBF functions as a library with much more content but slower access.

“For a GPU to perform AI inference, it must read variable data called the KV cache from the HBM. It then interprets and spits it out word by word, and I believe it will use the HBF for this task,” Professor Kim said.

“HBM is fast, HBF is slow, but its capacity is about 10 times larger. However, although HBF has no limit on the number of reads, it has a limit on the number of writes, around 100,000. Therefore, when OpenAI or Google write programs, they need to structure their software to focus on reads.”

HBF is expected to debut with HBM6, where multiple HBM stacks are interconnected into a network, increasing both bandwidth and capacity.

The concept envisions future iterations such as HBM7 functioning as a “memory factory”, where data can be processed directly from the HBF without being diverted through traditional storage networks.

HBF stacks multiple 3D NAND dies vertically, similar to HBM stacking DRAM, and connects them with through-silicon vias (TSVs).

A single HBF drive can reach a capacity of 512GB and bandwidth of up to 1,638TBps, far exceeding standard PCIe 4.0 NVMe SSD speeds.

SK Hynix and Sandisk have demonstrated diagrams showing upper NAND layers connected via TSV to a base logic die, forming a functional stack.

The prototype HBF chips require careful manufacturing to avoid warping in the lower layers, and additional NAND stacks would further increase the complexity of the TSV connections.

Samsung Electronics and Sandisk plan to add HBF to Nvidia, AMD and Google AI products in the next 24 months.

SK Hynix will launch a prototype later this month, while the companies are also working on standardization through a consortium.

HBF adoption is expected to accelerate in the HBM6 era, and Kioxia has already prototyped a 5TB HBF module using PCIe Gen 6 x8 at 64Gbps. Professor Kim predicts that the HBF market could overtake HBM by 2038.

Through sisajornal (originally in Korean)

Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds. Be sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp also.

Must Read

Leave a Comment Cancel Reply