Reading view

DeepSeek V4 Squeezes Million-Token Context Into 10% of V3.2’s Memory, Escalating China’s AI Efficiency War With OpenAI

24 April 2026 at 22:24

Chinese artificial intelligence lab DeepSeek claims to significantly reduce computing resources required for token inference and memory resources with its latest V4 model, according to its release notes. DeepSeek claims that the V4 AI model requires just 27% single-token inference FLOPs and 10% of key-value (KV) cache when compared to its predecessor, the DeepSeek V3.2 model. The reduction in cache requirements addresses memory requirements, with lower requirements conserving memory and increasing the context available to model builders when creating their models. How DeepSeek V4 Slashes Compute and Memory Costs In its release notes for DeepSeek V4, DeepSeek outlines that the […]

Read full article at https://wccftech.com/deepseek-v4-cuts-kv-cache-by-90-at-1m-tokens-but-aggressive-compression-could-risk-needle-in-a-haystack-failures/

DeepSeek previews new AI model that ‘closes the gap’ with frontier models

TechCrunch

By:Ram Iyer

24 April 2026 at 17:30

DeepSeek says both models are more efficient and performant than DeepSeek V3.2 due to architectural improvements, and have almost "closed the gap" with current leading models, both open and closed, on reasoning benchmarks.