Xiaomi's open-weight MiMo-V2.5-Pro takes aim at Claude Opus with hours-long autonomous coding

Xiaomi Unveils Open-Weight MiMo v2.5 Pro, Challenging Claude Opus in Extended Autonomous Coding

Xiaomi has entered the competitive landscape of large language models with the release of MiMo v2.5 Pro, an open-weight AI model designed to rival Anthropic’s Claude 3.5 Sonnet Opus. This latest iteration from Xiaomi’s AI Lab emphasizes exceptional performance in long-duration autonomous coding tasks, positioning it as a formidable contender for developers seeking robust, self-sustaining code generation capabilities.

The MiMo series, previously known for its advancements in Mixture-of-Experts (MoE) architectures, reaches a new milestone with v2.5 Pro. This model boasts 128 billion total parameters, of which 32 billion are activated per token, optimizing efficiency while delivering high intelligence. Trained on over 20 trillion tokens using Xiaomi’s custom HyperMind infrastructure, MiMo v2.5 Pro leverages a sophisticated MoE design with 128 experts, enabling it to handle complex, multi-step reasoning and generation tasks with remarkable endurance.

One of the standout features is its prowess in autonomous coding over extended periods. In rigorous benchmarks, MiMo v2.5 Pro demonstrated the ability to maintain coherent, productive coding sessions lasting up to 7.5 hours. This surpasses Claude 3.5 Sonnet Opus, which typically sustains such tasks for around 4 hours. Evaluated using the SWE-Bench Verified dataset, a challenging benchmark for software engineering tasks involving real-world GitHub issues, MiMo v2.5 Pro achieved a resolution rate of 38.6%, edging out competitors like DeepSeek R1 (37.6%) and Claude 3.5 Sonnet (36.4%). This metric underscores its capacity to autonomously plan, implement, debug, and refine code without human intervention, making it ideal for full-cycle software development workflows.

MiMo v2.5 Pro’s architecture incorporates several innovations that contribute to its longevity in coding marathons. The model’s enhanced context window of 256,000 tokens allows it to retain vast amounts of project history, reducing errors from context loss during prolonged interactions. Additionally, Xiaomi integrated advanced self-verification mechanisms, where the model periodically assesses its own outputs for consistency and correctness, mimicking human developer practices. This is powered by a refined chain-of-thought prompting system tailored for coding, which breaks down complex problems into manageable sub-tasks and iterates on solutions dynamically.

Beyond coding, MiMo v2.5 Pro excels in general intelligence benchmarks. On the Arena-Hard leaderboard, it ranks second overall with an Elo score of 1377, trailing only behind the proprietary GPT-4o but ahead of Claude 3.5 Sonnet Opus. In mathematics, it scores 87.5% on AIME 2024, 92.3% on HMMT Feb 2024, and 39.5% on USAMO 2024, showcasing strong reasoning capabilities. Coding-specific evaluations further highlight its strengths: 78.0% on LiveCodeBench, 71.6% on CodeForces, and 61.5% on APPS, all competitive with top closed-source models.

What sets MiMo v2.5 Pro apart is its fully open-weight release under the Apache 2.0 license. Xiaomi has made the model weights, along with training code and inference optimizations, publicly available on Hugging Face. This democratizes access to high-performance AI, allowing researchers and developers to fine-tune, deploy, and extend the model without restrictions. The release includes quantized versions (FP8 and INT4) for efficient inference on consumer hardware, with reported speeds of up to 150 tokens per second on an NVIDIA H100 GPU. Xiaomi also provides comprehensive documentation, including deployment scripts for frameworks like vLLM and TensorRT-LLM, facilitating seamless integration into production environments.

The model’s training process utilized Xiaomi’s HyperMind 2.0 platform, a massive cluster comprising 16,000 NVIDIA H800 GPUs. This infrastructure enabled the processing of a diverse dataset spanning code, mathematics, science, and multilingual content, with a focus on high-quality synthetic data generation to enhance reasoning depth. Post-training reinforcement learning further refined its alignment for helpfulness, harmlessness, and instruction-following, ensuring reliable performance across applications.

Xiaomi’s push into open-weight AI reflects a broader strategy to build an ecosystem around MiMo. The company plans to integrate the model into its HyperOS platform, powering on-device AI features in smartphones and IoT devices. For enterprise users, API access via Xiaomi Cloud offers scalable deployment options, with pricing competitive to industry standards.

Developers have already begun experimenting with MiMo v2.5 Pro. Early feedback praises its stability in agentic workflows, where it autonomously navigates tools like web browsers, shells, and databases. In a demo scenario, the model successfully built a full-stack web application from a high-level specification, handling frontend design, backend logic, database schema, and deployment over several hours without faltering.

As open-weight models continue to close the gap with proprietary giants, MiMo v2.5 Pro signals Xiaomi’s ambition to lead in AI accessibility. Its combination of raw power, endurance, and openness makes it a game-changer for autonomous coding and beyond, inviting the global developer community to push the boundaries of what’s possible with AI-driven software engineering.

Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.