MiniMax M3: Open-weight model with a million-token context challenges proprietary leaders

amu · June 1, 2026, 1:40pm

Minimax M3: An Open-Weight Model With a Million-Token Context

Minimax has released M3, an open-weight AI model with a 1 million token context window, directly challenging proprietary leaders like GPT-4 and Claude. The model, a Mixture of Experts architecture with 500 billion total parameters (45 billion active per token), handles both text and images. This release signals a major escalation in the open-weight AI arms race, giving developers access to a long-context model that was previously the domain of closed, paid APIs.

The Million-Token Context Window

M3’s context length is its headline feature. The model can process roughly 750,000 words in a single prompt — equivalent to three volumes of “War and Peace.” This allows it to analyze entire books, long legal documents, or extensive codebases without chunking or summarization.

“A million-token context means you can feed the model an entire code repository and ask for a bug fix — it sees the whole codebase at once.”

Minimax claims M3 achieves near-perfect “needle in a haystack” retrieval accuracy even at maximum context length, a benchmark that many long-context models fail.

Performance Benchmarks

M3 posts competitive scores across standard evaluations. On MMLU (multitask language understanding) it matches GPT-4 and outperforms some open rivals. On coding tasks (HumanEval, MBPP) it shows strong gains. The model also excels in multi-image reasoning, where it can analyze several photographs side by side.

Math and logic: Scores above 85% on GSM8K and MATH datasets.
Long-context QA: Outperforms Claude 3.5 Sonnet on the RULER benchmark for 128k+ token inputs.
Vision-language: Comparable to GPT-4V on VQAv2 and OCR tasks.

However, the model slightly trails GPT-4 Turbo and Gemini Ultra on some creative writing tasks, where proprietary models still hold an edge.

Open-Weight vs. Proprietary

Minimax released M3 under a custom open-weight license — weights are freely downloadable, but the model is not fully open source. The license allows commercial use but restricts redistribution of derived models to competitors with over 100 million monthly active users.

This middle ground keeps the technology accessible for startups and researchers while protecting Minimax’s commercial interests. It contrasts with Meta’s fully open Llama 3.1 and Google’s fully closed Gemini.

“The open-weight approach gives developers real power without the full freedom of open source. Train your own fine-tunes, but don’t expect to launch a competing foundation model.”

Technical Architecture

M3 uses a Mixture of Experts (MoE) design with 500B total parameters. Only 45B are activated per token, keeping inference costs manageable. The model also employs a hybrid attention mechanism — combining dense attention for short-range context with sparse attention for long-range information.

Training data included over 5 trillion tokens, primarily English and Chinese text. The model supports multilingual input across 20+ languages, though performance varies for low-resource languages.

Availability and Usage

M3 is available on Hugging Face and via Minimax’s API. The model requires approximately 90 GB of GPU memory (FP16) for inference, making it runnable on a single A100 80GB or two RTX 4090s in parallel. A quantized version (INT8) that fits on a single 40GB GPU is also provided.

Local deployment is straightforward: download weights, run via vLLM or Hugging Face Transformers, and start prompting. No internet connection needed after download.

Implications for the AI Landscape

Minimax, a Chinese company, positions M3 as a direct competitor to U.S. frontier models. The million-token context window has been a key differentiator for Claude and Gemini — now it is available to anyone with a GPU.

Enterprise use: Analyze entire contracts, audit logs, or scientific papers in one pass.
Research: Long-context reasoning unlocks new experiments in retrieval, summarization, and agentic workflows.
Startups: No per-token API costs for long prompts — run locally or on cheap cloud GPUs.

The model still has limits: it is less effective on highly nuanced creative tasks and its safety alignment is minimal compared to Claude. Developers must implement their own guardrails for sensitive use cases.

The Bottom Line

Minimax M3 democratizes a capability that was previously locked behind proprietary APIs. For developers who need to process vast amounts of text or images in a single prompt, this open-weight model offers a compelling, cost-effective alternative. It is not a perfect replacement for GPT-4 or Claude in every scenario, but for long-context tasks it now sets a new open standard.

Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.