DeepSeek V4 Squeezes Million-Token Context Into 10% of V3.2’s Memory, Escalating China’s AI Efficiency War With OpenAI

24 April 2026 at 22:24

Chinese artificial intelligence lab DeepSeek claims to significantly reduce computing resources required for token inference and memory resources with its latest V4 model, according to its release notes. DeepSeek claims that the V4 AI model requires just 27% single-token inference FLOPs and 10% of key-value (KV) cache when compared to its predecessor, the DeepSeek V3.2 model. The reduction in cache requirements addresses memory requirements, with lower requirements conserving memory and increasing the context available to model builders when creating their models. How DeepSeek V4 Slashes Compute and Memory Costs In its release notes for DeepSeek V4, DeepSeek outlines that the […]

Read full article at https://wccftech.com/deepseek-v4-cuts-kv-cache-by-90-at-1m-tokens-but-aggressive-compression-could-risk-needle-in-a-haystack-failures/