- DeepSeek’s Engram separates static memory from computation, increasing efficiency in large AI models
- The method reduces high-speed memory needs by allowing DeepSeek models to use searches.
- Engram supports asynchronous prefetching on multiple GPUs with minimal performance overhead
DeepSeek, in collaboration with Peking University, introduced a new training method called Engram, designed to decouple memory storage from computational processes.
Traditional large language models require high-bandwidth memory for knowledge retrieval and basic computing, creating a bottleneck in both performance and cost.
This HBM bottleneck is widely recognized as a key reason why DRAM prices increased 5x in just 10 weeks, as demand for hardware skyrocketed to support large AI models.
Validation and technical approach.
The researchers said that existing models waste sequential depth on trivial operations, which could otherwise support higher-level reasoning.
Engram allows models to efficiently “search” essential information without overloading GPU memory, freeing up capacity for more complex reasoning tasks.
The system was tested on a 27 billion parameter model and showed measurable improvements on industry standard benchmarks.
By performing knowledge retrieval via hash N-grams, Engram provides access to static memory regardless of the current context.
The retrieved information is then adjusted using a context-aware gating mechanism to align with the hidden state of the model.
This design allows models to handle long context inputs more efficiently and supports system-level prefetching with minimal performance overhead.
The Engram method complements other hardware-efficient approaches, including solutions like Phison’s AI inference accelerators.
Engram minimizes the amount of high-speed memory required by using static information lookups, making memory usage more efficient.
Phison offers a cost-effective way to expand total memory using SSD, supporting large AI models such as Engram or Mixture-of-Experts systems.
Combined, these approaches allow AI systems to optimize rapid memory usage while affordably increasing overall memory capacity.
It also works in conjunction with emerging Compute Express Link (CXL) standards, which aim to overcome GPU memory bottlenecks in large-scale AI workloads.
The method separates static pattern storage from dynamic computation, improving the backbone of Transformer without increasing FLOPs or parameter counts.
DeepSeek formalized a U-shaped expansion rule to optimize the parameter mapping between the MoE conditional calculation module and the Engram memory module.
Tests show that reallocating about 20% to 25% of the meager parameter budget to Engram produces better performance than pure MoE models, maintaining stable gains at different scales.
Memory slot expansion provides predictable improvements without additional computational cost.
This confirms the scalability of conditional memory as an independent axis for sparse models.
Engram’s deterministic recovery mechanism allows memory capacity to scale linearly across multiple GPUs while supporting asynchronous prefetching during inference.
It offloads the static reconstruction of knowledge from lower layers, freeing attention mechanisms to focus on the global context.
Hierarchical caching of frequently used embeddings improves efficiency and the module works with existing GPU and system memory architectures, potentially avoiding costly HBM upgrades.
This technique can relieve pressure on expensive memory hardware, particularly in regions like China, where HBM’s access lags behind competitors such as Samsung, SK Hynix and Micron.
Engram’s early validation suggests that models can expand parameter scaling and reasoning ability while managing memory demands more efficiently.
This approach can help alleviate memory limitations in AI infrastructure, potentially reducing sharp price swings for DDR5 DRAM.
Through SCMP
Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds. Be sure to click the Follow button!
And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp also.




