Liabooks Home|PRISM News
Conceptual 3D visual of DeepSeek Engram architecture
TechAI Analysis

How DeepSeek Engram Conditional Memory Slashes Enterprise AI Costs

2 min readSource

DeepSeek Engram conditional memory introduces a 75/25 ratio for reasoning vs. memory, significantly reducing GPU costs while improving accuracy by up to 4%.

Your AI is likely using a high-end calculator to remember its own phone number. DeepSeek's newly released research on "conditional memory" aims to end this expensive inefficiency by decoupling static pattern retrieval from dynamic reasoning. Co-authored by founder Liang Wenfeng, the work introduces Engram, a module that could redefine the cost-to-performance ratio for enterprise LLMs.

How DeepSeek Engram Conditional Memory Redefines LLM Efficiency

Standard Transformers lack a native lookup ability, forcing them to waste GPU cycles to simulate the retrieval of simple facts. Engram solves this by using hash functions to access a massive embedding table in constant time. Through systematic testing, DeepSeek discovered an optimal allocation: 75% of capacity for reasoning and 25% for memory.

The results are telling. Complex reasoning accuracy on benchmarks like Big-Bench Hard jumped from 70% to 74%, while knowledge-focused tests improved from 57% to 61%. These gains demonstrate that offloading static knowledge actually frees the model to think more clearly.

The 75/25 Law: Shifting Infrastructure to RAM

Perhaps the most pragmatic breakthrough is how Engram bypasses GPU memory constraints. By offloading a 100B-parameter embedding table to host DRAM, the system accesses information via PCIe without bottlenecking the GPU. The researchers achieved throughput penalties below 3% by overlapping retrieval with earlier computation layers.

As Chris Latimer, CEO of Vectorize, told VentureBeat, this approach isn't just about agentic memory like conversation history. It's about squeezing peak performance from smaller models and making the most of scarce GPU resources. For enterprises, this suggests a shift in investment: from pure GPU scaling to memory-rich, compute-moderate configurations.

This content is AI-generated based on source articles. While we strive for accuracy, errors may occur. We recommend verifying with the original source.

Related Articles