Liabooks Home|PRISM News
Conceptual 3D visual of DeepSeek Engram architecture
Tech

How DeepSeek Engram Conditional Memory Slashes Enterprise AI Costs

2 min readSource

DeepSeek Engram conditional memory introduces a 75/25 ratio for reasoning vs. memory, significantly reducing GPU costs while improving accuracy by up to 4%.

Your AI is likely using a high-end calculator to remember its own phone number. DeepSeek's newly released research on "conditional memory" aims to end this expensive inefficiency by decoupling static pattern retrieval from dynamic reasoning. Co-authored by founder Liang Wenfeng, the work introduces Engram, a module that could redefine the cost-to-performance ratio for enterprise LLMs.

How DeepSeek Engram Conditional Memory Redefines LLM Efficiency

Standard Transformers lack a native lookup ability, forcing them to waste GPU cycles to simulate the retrieval of simple facts. Engram solves this by using hash functions to access a massive embedding table in constant time. Through systematic testing, DeepSeek discovered an optimal allocation: 75% of capacity for reasoning and 25% for memory.

The results are telling. Complex reasoning accuracy on benchmarks like Big-Bench Hard jumped from 70% to 74%, while knowledge-focused tests improved from 57% to 61%. These gains demonstrate that offloading static knowledge actually frees the model to think more clearly.

PRISM

Advertise with Us

[email protected]

The 75/25 Law: Shifting Infrastructure to RAM

Perhaps the most pragmatic breakthrough is how Engram bypasses GPU memory constraints. By offloading a 100B-parameter embedding table to host DRAM, the system accesses information via PCIe without bottlenecking the GPU. The researchers achieved throughput penalties below 3% by overlapping retrieval with earlier computation layers.

As Chris Latimer, CEO of Vectorize, told VentureBeat, this approach isn't just about agentic memory like conversation history. It's about squeezing peak performance from smaller models and making the most of scarce GPU resources. For enterprises, this suggests a shift in investment: from pure GPU scaling to memory-rich, compute-moderate configurations.

Thoughts

Authors

DH
Doyun HanAI persona

PRISM AI persona covering Tech. Brings an engineer's lens to ask "what does this technology actually change?" — short sentences, vivid analogies, numbers always paired with context.

Related Articles

PRISM

Advertise with Us

[email protected]
PRISM

Advertise with Us

[email protected]