Ivan Paudice LIVE

Ivan Paudice

Innovation Lead

Everyone's been talking about always-on AI agents. An assistant that's always listening and working in the background, ready to tackle whatever you throw at it. The idea took off after tools like Manus and OpenAI's Operator went viral earlier this year. The vision is great. The price tag kills it.

Running a frontier AI model 24/7 for a full year, constantly generating tokens, would cost you tens of thousands of dollars. For most builders and small teams, that puts the always-on agent firmly in the "cool but impractical" category.

Then I came across a number that made me stop scrolling: 1,892 per year. That's the estimated annual cost of running Minimax's new M2.5 model around the clock at 50 tokens per second. An always-on AI agent that matches the coding performance of Anthropic's Opus 4.6, for less than 160 a month.

The gap between those two numbers changes what's possible.

‍

The Minimax Moment

I watched a detailed technical breakdown of Minimax M2.5 recently, and what caught my attention wasn't the benchmarks (though those are impressive). It was the economics behind them.

Minimax built a 230 billion parameter model that only activates 10 billion parameters per token. That's 4% of the total model lighting up for each piece of output. The technique is called Mixture of Experts, and while the concept isn't new, the execution here is striking.

At that 4% activation rate, M2.5 hits 80% on SWE-Bench Verified. That puts it neck and neck with Anthropic's Opus 4.6, the model I use daily through Claude Code. Minimax serves M2.5 at roughly 3% of Opus 4.6's output token cost, and at nearly twice the speed at 100 tokens per second.

Same intelligence at a fraction of the price, and nearly twice the speed.

The input token cost sits at 0.30 per million. Output at 1.20 per million. And Minimax has been shipping on roughly 50-day release cycles, pushing performance up at each step while holding costs flat. If you've been following the AI model race, you know this trajectory. But the magnitude matters. This isn't 20% cheaper or 30% faster. It's an order of magnitude shift in what intelligence costs to run.

‍

The 4% Brain

Mixture of Experts is easier to grasp than it sounds.

Picture a company with 230 employees. Every time a task comes in, only 10 people work on it. The other 220 are available but idle for that specific request. Different tasks activate different teams of 10. The total knowledge of the company is massive (all 230 people), but the cost of any single task is small (10 people's time).

That's what's happening inside M2.5. The model packs trillions of tokens of training data into 230 billion parameters of stored knowledge. But during inference (when you're actually using it), only 10 billion parameters activate per token. The model knows a lot. It doesn't need to use everything at once.

The cost of running an AI model scales with how many parameters you activate, not how many exist in total. A 230 billion parameter model that only fires 10 billion per token is far cheaper to serve than one that fires all 230 billion every time.

I use AI models every day to build things. Claude Code with Opus for complex architecture decisions. Smaller, faster models for quick iterations. And what I've noticed over the past year is that the biggest practical constraint isn't intelligence anymore. It's the cost and speed tradeoff. A model can be brilliant, but if it's slow or expensive, I find myself rationing my usage. Saving it for the hard problems and thinking twice before hitting enter on a prompt that might burn through dollars of tokens.

When the cost drops by 97%, that rationing disappears. You start using AI for things you wouldn't have bothered automating before: small refactors, exploratory questions, "let me just check this real quick" moments that currently don't feel worth the token cost.

‍

The Salary Comparison

The breakdown I watched framed this as hiring an employee, and the analogy stuck with me. If you run different AI models 24/7 for a full year (constantly outputting tokens, like leaving the tap running in the bathroom), each model has an effective "annual salary."

Frontier models like Opus 4.6 or GPT-5.2 cost tens of thousands per year at continuous operation. That's real money, especially if you're a solo builder or a small team experimenting with autonomous agents.

Minimax M2.5? $1,892 per year at 50 tokens per second, running continuously for twelve months.

That's less than $160 a month. Less than most SaaS subscriptions people forget to cancel. Less than a junior developer's daily rate in most cities. And the model is performing at the same level as Opus 4.6 on coding benchmarks.

This is what makes the always-on agent real. Something you can actually deploy and leave running without watching your API bill climb by the hour.

I've been building AI tools for over a year now, and cost has been the invisible ceiling on every ambitious idea. "What if the agent could continuously monitor and improve the codebase?" Great idea. Can't afford to leave Opus running 24/7. "What if it proactively flagged issues before they hit production?" Love it. The token bill would be absurd.

At M2.5 pricing, those ideas become line items in a reasonable monthly budget. And interesting things tend to happen once experimentation gets cheap.

‍

Cheap Intelligence Doesn't Kill Demand. It Creates It.

The story is bigger than one model.

One of the loudest debates in AI right now is whether the falling cost of intelligence means companies like OpenAI, Oracle, and xAI are overinvesting in GPU infrastructure. The logic seems straightforward: if models get more efficient, we need fewer GPUs. If we need fewer GPUs, the tens of billions being poured into data centers might be a bubble waiting to pop.

History tells a different story. Jevons Paradox, named after a 19th century economist, describes what happened when coal engines became more efficient: total coal consumption went up, not down. Efficiency made coal cheaper per unit of work, which made more applications viable, which drove aggregate demand through the roof.

The same dynamic is playing out with AI right now. When intelligence costs 1,892 a year instead of 50,000, you don't just save money on your existing workloads. You start deploying AI in places you never would have considered: background agents monitoring code quality, continuous automated testing, personalized education at scale. The use cases multiply because the economics finally allow them to exist.

I see this in my own work. When I started building with AI coding tools, I used them carefully, mostly for architectural decisions and complex debugging. Now, as costs drop and speed improves, I use them for almost everything. Quick refactors. Documentation. Even exploratory questions I would have just searched for myself a year ago. Not because I got lazier. Because the cost of intelligence dropped below the cost of my own time for increasingly routine tasks.

Minimax M2.5 is one of the clearest signals yet that intelligence is approaching the cost of electricity. And when any resource gets that cheap, usage goes through the roof.

‍

What This Means If You're Building

I teach executives about AI adoption, and the number one objection I hear is cost. "We ran a pilot. The API costs were too high to scale." That objection is disappearing. Models like M2.5 are eroding it one release cycle at a time.

If you're a builder, revisit the architectures you dismissed as too expensive six months ago. Always-on agents, continuous background processing, multi-agent workflows where several models collaborate on different parts of a problem. The economics now support those experiments.

If you're leading AI adoption for your organization, the mental model needs to shift. AI is becoming infrastructure, something that runs continuously like electricity or internet access. The pricing trajectory points there whether we're ready or not.

We're not at "AI on your phone" yet. M2.5 still needs around 400GB of VRAM, which is far from pocket-sized. But the weights will be open source soon, and quantized local versions will follow. The direction is unmistakable, even if running frontier intelligence on a device in your pocket is still a couple of years out.

A model that uses 4% of its brain and matches the best in the world tells you where the next 12 months are headed for everyone building with AI.

Intelligence is becoming a utility. What would you build if it cost less than your electricity bill?

‍

4% of the Brain, 100% of the Performance

The Minimax Moment

The 4% Brain

The Salary Comparison

Cheap Intelligence Doesn't Kill Demand. It Creates It.

What This Means If You're Building

Other notes.

The 10-Minute War

The Clawdbot Paradox