Xiaomi has always been known for affordable smartphones and smart home gadgets. But over the last year and a half, the company has quietly turned itself into one of the most ambitious AI players in the world.
From large language models and voice cloning to an autonomous phone agent and a massive investment war chest, Xiaomi is moving fast. Here is everything you need to know about where Xiaomi is in the big AI and LLM race.

Where did Xiaomi enter the LLM race
Xiaomi’s AI story really kicked off in April 2025, when the company released MiMo-7B, its first open-source large language model. For those unaware, the name “MiMo” stands for Xiaomi Model (Mi and Mo). The good thing from the start is that Xiaomi is focusing on reasoning and coding, rather than just chatting.
Despite having only 7 billion parameters, Xiaomi claimed MiMo-7B punched well above its weight. On math benchmarks like MATH-500, the reinforcement-learning version of the model reportedly scored 95.8%. Surprisingly, it also outperformed OpenAI’s o1-mini and Alibaba’s Qwen-32B-Preview on the AIME 2024 and 2025 math competitions.
The model was trained on a specially curated dataset of 200 billion reasoning tokens, with a total of 25 trillion tokens across three training phases. Xiaomi released it under an open-source MIT license, and it is available on Hugging Face.
The development team was led by Luo Fuli, who came to Xiaomi from DeepSeek.
1. MiMo-V2-Flash
Xiaomi MiMo-V2-Flash Benchmark
By December 2025, Xiaomi announced MiMo-V2-Flash, a 309-billion-parameter model that kept most of its weight “inactive.” That is, you could only use about 15 billion parameters at any given time, thanks to a Mixture-of-Experts (MoE) design.
What made it stand out was the combination of performance and speed. It ranked in the top two among open-source models on reasoning benchmarks, matched GPT-5 and Claude 4.5 Sonnet on software engineering tests (SWE-Bench Verified), and could generate responses at 150 tokens per second while reportedly costing just 2.5% of Claude’s inference price. Xiaomi priced API access at $0.1 per million input tokens and offered free access for a limited time at launch.
MiMo-V2-Flash also introduced the Multi-Token Prediction (MTP) technique that lets the model generate and verify multiple tokens at once.
2. MiMo-V2-Pro: The Trillion-Parameter Flagship
Xiaomi MiMo-V2-Pro Benchmark
March 2026 brought Xiaomi’s most ambitious model yet. MiMo-V2-Pro has over one trillion total parameters with 42 billion active parameters per pass. It supports a context window of one million tokens, meaning it can process the equivalent of several long novels in a single conversation. Xiaomi says the model is specifically built for “agentic” tasks: complex, multi-step jobs that require planning and execution without constant human input.
The model actually first appeared on OpenRouter, the AI gateway platform, uploaded anonymously under the name “Hunter Alpha.” It quickly shot to the top of the leaderboard, processing over 1.5 trillion tokens before Xiaomi officially took credit. That kind of organic developer attention was a signal the model was genuinely competitive.
Alongside MiMo-V2-Pro, Xiaomi also dropped two companion models: MiMo-V2-Omni (a multimodal version that can process text, images, audio, and video) and MiMo-V2-TTS (a text-to-speech model for the agent pipeline).
3. MiMo-V2.5 and V2.5-Pro
In late April 2026, Xiaomi merged the best of its V2 family into a single architecture. MiMo-V2.5-Pro is a 1.02 trillion-parameter model that handles text, image, audio, and video all in one. It runs at 60 to 80 tokens per second for complex tasks, while the lighter MiMo-V2.5 (for everyday use) hits 100 to 150 tokens per second.
V2.5-Pro also ranked as the world’s top open-source model for agentic capabilities on the Artificial Analysis benchmark at the time of launch.
Xiaomi also removed additional charges for using the full 1 million-token context window and reset user credits at launch, making it more developer-friendly.
And just recently, in early June 2026, Xiaomi launched MiMo Code, a terminal-based AI coding agent based on MiMo-V2.5. Unlike most coding assistants that forget context once the window fills up, MiMo Code features a persistent memory system that keeps track of decisions across long projects.
4. MiMo-VL
On the visual side, Xiaomi released MiMo-VL (Vision-Language) and its home-focused variant, MiMo-VL-Miloco-7B. The Miloco model is designed to understand home environments.
It can recognize everyday gestures like thumbs-up, OK, peace signs, and open palms, and it can identify common household activities like watching TV, working out, or reading. It is built on a mix of supervised fine-tuning and reinforcement learning, keeping the model “home-smart” without losing general capability.
5. MiDashengLM-7B
Released in August 2025, MiDashengLM-7B is Xiaomi’s audio AI model. Unlike most voice AI systems that are trained primarily on speech recognition (which discards a lot of non-verbal audio information), this model uses a “general audio caption” approach. It was trained on a massive 38,662-hour dataset and can understand not just words, but music, environmental sounds, speaker emotion, and acoustic context.
It is built on Qwen2.5-Omni-7B from Alibaba and is embedded in Xiaomi’s electric vehicles and smart home appliances. Xiaomi released it under an Apache 2.0 license, making it available for commercial use.
6. MiMo-Audio: Hearing at Scale
Alongside its vision and language work, Xiaomi also published MiMo-Audio, a separate audio language model. The audio encoder from MiMo-Audio was later integrated into MiMo-V2.5 to power the omnimodal experience.
7. OmniVoice: Cloning Any Voice in Any Language
One of Xiaomi’s most impressive recent releases is OmniVoice, a text-to-speech model from Xiaomi’s AI Lab’s next-gen Kaldi team, open-sourced in May 2026.
OmniVoice supports 646 languages, including many low-resource languages that have very little available training data. It is a zero-shot voice cloning model, meaning it can clone a voice from just a few seconds of reference audio and generate natural-sounding speech across languages while preserving the original voice characteristics.
What sets OmniVoice apart technically is its simplified single-transformer architecture that maps text directly to acoustic tokens. This lets it complete training on 100,000 hours of audio data in a single day and run inference at up to 40x real-time speed using PyTorch.
Xiaomi says OmniVoice is the first voice cloning TTS model to cover hundreds of languages. It also has practical tools for correcting tricky pronunciations, like polyphonic Chinese characters or uncommon English proper nouns. Everything is available under the Apache-2.0 license.
8. MiMo-V2.5-TTS and ASR: A Full Voice Pipeline
Alongside the broader V2.5 launch, Xiaomi also released MiMo-V2.5-TTS and an ASR (Automatic Speech Recognition) system.
The TTS model supports voice cloning, and the ASR handles bilingual recognition. Together, they let developers build end-to-end voice-driven products without having to stitch together tools from different providers.
9. Xiao AI and HyperAI: The Consumer-Facing Side
On the consumer side, Xiaomi has two main AI experiences for regular users.
Xiao AI (小爱) is Xiaomi’s long-running voice assistant, available on smartphones, smart speakers, and wearables. With HyperOS 2, it was upgraded to become “Super Xiao AI,” with better context memory, smarter home device control, and the ability to generate images from text. It is deeply integrated into HyperOS’s three-pillar system: HyperCore for performance, HyperConnect for device syncing, and HyperAI for smart features.
HyperAI, introduced globally at MWC 2025 and rolled out to phones starting with the Xiaomi 15 series, is a suite of AI features baked into HyperOS 2. It includes real-time translation, AI writing assistance, smart speech recognition that summarizes recordings, and AI photo editing. For global devices, Xiaomi also integrated Google Gemini as a backend. HyperAI has since expanded to mid-range devices, including the Redmi Note 14 Pro+ 5G and Poco series.
10. miclaw: The AI Agent That Does Things For You
The most forward-looking piece of Xiaomi’s AI puzzle is miclaw. Announced in March 2026 and currently in closed beta, miclaw is not a chatbot. It is an autonomous AI agent built on MiMo.
Rather than just answering questions, miclaw interprets what you want and then actually does it. It can open apps, navigate interfaces, fill in forms, interact with system tools, and complete multi-step tasks across your phone, all without you needing to supervise every step. This works through what Xiaomi calls an “inference-execution loop”: the AI figures out what to do, does it, checks the results, and continues until the task is complete.
miclaw also has contextual memory that compresses old interactions while keeping the original intent of a task in mind. It can connect to Xiaomi’s broader smart home and car ecosystem as well.
On privacy, Xiaomi says user interactions with miclaw are not used to train AI models. Personal data is processed in real time only to execute commands, and sensitive information is handled locally on the device through what Xiaomi describes as “edge-cloud privacy computing.”
The current closed beta supports the Xiaomi 17 series. According to Xiaomi, HyperOS 4 will fully integrate miclaw at the system level.
miclaw has also been tested as a smartwatch assistant, running through the Xiaomi Health app. Users press and hold a button to speak, and the response is processed on the connected phone and displayed on the watch.
11. The Money Behind It All
In March 2026, Xiaomi CEO Lei Jun announced the company would invest at least $8.7 billion in AI over the next three years. That is on top of the company’s already-rising R&D budgets. As a result, Xiaomi’s annual R&D spend is projected to hit around 40 billion yuan ($5.7 billion) in 2026.
The payoff is becoming visible. By early April 2026, Xiaomi’s models had captured around 21% of all traffic on OpenRouter, the AI routing platform. Lei Jun has also said the company is aiming for a “grand convergence” in 2026, bringing its own chip, its own OS, and its own AI model together in a single device.
12. What This All Means
Twelve months ago, Xiaomi had no public AI models. Today, it has a full stack: reasoning models, vision-language models, audio models, a voice cloning system, a TTS/ASR pipeline, an AI agent, and consumer AI features reaching millions of devices.
The pace at which Xiaomi is developing and releasing these models are striking, to say the least. And the fact that most of them are open-source helps Xiaomi build real developer momentum fast.
The big test ahead is whether miclaw and HyperOS 4 can make all this AI actually useful in people’s daily lives. If they can, Xiaomi will not just be a phone company that does AI on the side. It will be a genuine AI platform.
Stay tuned to Gizmochina for the latest updates on Xiaomi’s AI journey.
For more daily updates, please visit our News Section.
Stay ahead in tech! Join our Telegram community and sign up for our daily newsletter of top stories!

