Xiaomi MiMo-V2-Flash LLM Just Dropped: These Are the Most Interesting Things About It
Xiaomi has unveiled its most advanced open-source large language model to date, called MiMo-V2-Flash, as part of its expanding push into foundation AI. The new model focuses on high-speed performance and an efficient architecture, with strong capabilities in reasoning and code generation.
Xiaomi positions MiMo-V2-Flash as a direct competitor to leading models such as DeepSeek V3.2 and Claude 4.5 Sonnet. Letβs take a closer look at how the model works, its key features, and how to access it.

Purpose-Built for Speed and Agents
MiMo-V2-Flash is a Mixture-of-Experts (MoE) model with 309 billion total parameters and 15 billion active parameters. The model is purpose-built for AI agent scenarios and multi-turn interactions that require fast inference.

Xiaomi uses a 1:5 hybrid attention architecture, which combines Global Attention and Sliding Window Attention (SWA) with a 128-token window. The native context length is 32,000 tokens, and the model is trained with support for up to 256,000 tokens.
This design helps MiMo-V2-Flash maintain high efficiency while scaling across long-context tasks. Xiaomi claims it delivers output faster than several leading models, including DeepSeek and Claude, while maintaining lower operational costs.
Benchmark Performance and Pricing
Benchmark results show MiMo-V2-Flash performing at the top tier across various domains. The model ranks in the top two among open-source models in reasoning tasks such as AIME 2025 and GPQA-Diamond.
In software engineering benchmarks like SWE-Bench Verified and SWE-Bench Multilingual, it outperforms other open-source models and reaches levels comparable to GPT-5 and Claude 4.5 Sonnet.
Xiaomi has priced the API at $0.1 per million input tokens and $0.3 per million output tokens. The API is currently available for free for a limited time. According to the company, MiMo-V2-Flash generates responses at 150 tokens per second, while maintaining only 2.5% of Claudeβs inference cost.
Technical Innovations Inside
The architecture includes Multi-Token Prediction (MTP), which allows the model to generate multiple tokens in parallel and verify them before output. This method increases decoding throughput without increasing attention or memory overhead. Xiaomi reports that with a three-layer MTP, the model reaches 2.0 to 2.6 times speed improvement compared to standard decoding.

Xiaomi also introduced a new post-training method called Multi-Teacher Online Policy Distillation (MOPD). The technique uses multiple teacher models to guide the student through token-level rewards in an on-policy learning process. It allows the model to achieve high performance with less than 1/50th of the training resources needed in traditional RL pipelines. MOPD also supports plug-and-play teachers, enabling continuous self-improvement cycles.
How to Access It?
Xiaomi has launched a web AI chat interface called MiMo Studio at aistudio.xiaomimimo.com, allowing users to interact directly with the model. The service supports web search, agent workflows, and code generation. It also features a toggle for switching between instant replies and slower βthinkingβ responses for deeper reasoning.
The model can generate functional HTML web pages and integrates well with development tools like Claude Code and Cursor. Xiaomi has also showcased creative and functional web demos.
Fully Open-Source
MiMo-V2-Flash is fully open-source under the MIT license. Model weights are available on Hugging Face, and all inference code is published on GitHub.
The company contributed inference code to SGLang on launch day and aims to grow developer adoption by offering transparent, low-cost access to high-performance AI tools.
MiMo-V2-Flash reflects Xiaomiβs shift toward becoming a serious player in the AI space. It brings competitive reasoning, fast code generation, and efficient agent deployment to the open-source ecosystem.
In related AI news, China has equipped traffic police with AI-powered smart glasses for real-time vehicle inspections, while a separate report highlights how even so-called βall-AI companiesβ still require human oversight due to limits in autonomous decision-making.
For more daily updates, please visit ourΒ News Section.
Stay ahead in tech! Join our Telegram community and sign up for our daily newsletter of top stories! ![]()
The post Xiaomi MiMo-V2-Flash LLM Just Dropped: These Are the Most Interesting Things About It appeared first on Gizmochina.