Chinese AI lab MiniMax has introduced MiniMax-M2.5, positioning it as a “real-world productivity” model trained with reinforcement learning across hundreds of thousands of complex environments.
MiniMax says M2.5 reaches state-of-the-art performance across high-value tasks like coding, agentic tool use, and search. The company highlights benchmark results including 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp (with context management).
On speed, MiniMax reports M2.5 completes SWE-Bench Verified evaluations 37% faster than M2.1, with runtime roughly on par with Claude Opus 4.6 — but at a fraction of the cost.
MiniMax says a notable improvement is M2.5’s tendency to write specs and decompose projects before coding — acting more like a senior engineer planning features, structure, and UI design first. The company states the model was trained across 10+ languages (Go, C/C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, Ruby) and over 200,000 real-world environments spanning web and multi-platform software stacks.
MiniMax emphasizes stronger generalization across unfamiliar agent scaffolds, plus improved decision-making: M2.5 reportedly achieves comparable or better outcomes with fewer search rounds and ~20% fewer rounds than M2.1 across several agentic evaluations.
MiniMax frames M2.5 as the first “frontier” model where users “do not need to worry about cost.” The company claims it can run continuously for about $1/hour at ~100 tokens/sec (and $0.30/hour at ~50 tokens/sec), explicitly arguing this pushes the industry toward “intelligence too cheap to meter.”
MiniMax is releasing two variants: M2.5 and M2.5-Lightning (same capability, different speed). The company states Lightning runs at ~100 tokens/sec and lists pricing of $0.30/M input tokens and $2.40/M output tokens, while standard M2.5 runs at ~50 tokens/sec at about half the price. Both support caching.
MiniMax says M2.5 is already deployed inside “MiniMax Agent,” and claims that internally 30% of daily tasks are autonomously completed by M2.5 across teams (R&D, product, sales, HR, finance), with M2.5-generated code accounting for 80% of newly committed code.
Why it matters: If these speed + pricing claims hold up in real deployments, models like M2.5 shift the “agentic” story from occasional automation to 24/7 always-on agents that can iterate, search, and execute workflows continuously — without the cost curve becoming the limit