/
Kimi K2.5 is now on Overchat AI — The First Open-Source Model to Beat Opus 4.5
Last Updated:
Jan 31, 2026

Kimi K2.5 is now on Overchat AI — The First Open-Source Model to Beat Opus 4.5

Moonshot AI released Kimi K2.5 — their most powerful AI model to date. It's the first open-source artificial intelligence model to beat giants like GPT-5.2 and Claude Opus 4.5 on many benchmarks — at a fraction of the cost.

The model is already live on Overchat AI, and you can start chatting with Kimi K2.5 here.

So, what makes this Moonshot AI model so exceptional? It combines a very strong ability to understand images and videos, a cutting-edge agent mode and best-in-class performance in coding and creative writing. This makes it the most well-rounded offering on the market. Read on for all the details.

What is Kimi K2.5?

Kimi K2.5 is Moonshot AI's flagship open-source model. It was released on January 27, 2026, and Moonshot positions it  as an alternative to GPT-5.2 and Claude Opus 4.5.

It's built on a mixture-of-experts architecture with 1 trillion total parameters, though only 32 billion activate per request. As a result, the model is both very fast and very accurate.

When we tested it, it felt as though its performance was on a par with that of the Claude Opus 4.5, a view that is reinforced by the benchmarks. We will discuss this in more detail later. It took about half the time to produce an answer.

Another standout feature of Kimi K2.5 is its ability to accurately understand images and videos. It boasts a 400-million-parameter vision encoder called MoonViT that is specifically responsible for interpreting images and videos. What does this mean in practice? 

For example, provide a design mock-up as a screenshot and it will produce it to pixel-perfect specification.

The model comes in multiple configurations:

  • K2.5 Instant — for faster responses
  • K2.5 Thinking — for complex problems
  • K2.5 Agent — when you use outside tools
  • K2.5 Agent Swarm (Beta) — to run up to 100 agents working on large task simultaneously

Agent Swarm mode is unique — we’ve never seen capabilities like this outside of custom enterprise solutions. But how does it work in practice? Kimi K2.5 can direct up to 100 sub-agents independently, each working on a separate task. According to Moonshot, this speeds up the model by 4.5x compared to when just one AI is working.

It's a bit disappointing that the speed increase isn't linear, i.e. making it 100 times faster, but this is still impressive.

Kimi K2.5 Benchmarks

Benchmark performance doesn't tell the full story, but these numbers establish where K2.5 sits among frontier models.

Core Benchmarks

Benchmark Kimi K2.5
HLE (with tools) 50.2%
BrowseComp 74.9%
AIME 2025 96.1%
GPQA Diamond 87.6%
SWE-Bench Verified 76.8%
LiveCodeBench v6 85.0%

Vision Benchmarks

Benchmark Kimi K2.5
MMMU Pro 78.5%
MathVision 84.2%
VideoMMMU 86.6%

Kimi K2.5 vs Other AI Models

Now let's take a look at how the Kimi K2.5 compares with other models, including both proprietary flagship products and open-source competitors.

Kimi K2.5 vs Claude Opus 4.5

Benchmark Kimi K2.5 Claude Opus 4.5 Winner
HLE (with tools) 50.2% ~45% Kimi K2.5
SWE-Bench Verified 76.8% 80.9% Claude Opus 4.5
BrowseComp 74.9% ~24% Kimi K2.5
Benchmark Run Cost $0.27 $1.14 Kimi K2.5

Kimi K2.5 actually beats Claude Opus 4.5 in most tests! While costing much less to run, but more on this later.

Kimi K2.5 vs ChatGPT (GPT-5.2)

Benchmark Kimi K2.5 GPT-5.2 xhigh Winner
AIME 2025 96.1% 100% GPT-5.2
HLE (with tools) 50.2% ~42% Kimi K2.5
LiveCodeBench v6 85.0% 87.0% GPT-5.2
Benchmark Run Cost $0.27 $0.48 Kimi K2.5

The results are more mixed: GPT-5.2 performs better at maths, but K2.5 performs better in complex tasks, especially with regard to agents. However, we're splitting hairs here.

Kimi K2.5 vs Gemini 3 Pro

Benchmark Kimi K2.5 Gemini 3 Pro Winner
GPQA Diamond 87.6% 91.9% Gemini 3 Pro
VideoMMMU 86.6% 87.6% Gemini 3 Pro

Tests show that the Gemini 3 Pro outperforms the K2.5 in terms of scientific reasoning and video understanding, despite the fact that the Kimi’s architecture was specifically designed for understanding visual media. This is interesting, but we didn't find Gemini to be more accurate in practice.

Kimi K2.5 vs DeepSeek V3.2

Benchmark Kimi K2.5 DeepSeek V3.2 Winner
HLE (with tools) 50.2% ~46% Kimi K2.5
SWE-Bench Verified 76.8% ~75% Kimi K2.5

Kimi K2.5 crushes DeepSeek V3.2 in all tests. With that being said one has to wonder, when DeepSeek 4 releases, will it beat Kimi 2.5?

Kimi K2.5 Pricing

In terms of pricing, the Kimi K2.5 is one of the most cost-effective models relative to its performance. Generally speaking, only closed-source models perform at a comparable level, and those cost two to four times more to run.

API Pricing

  • Input (cache miss): $0.60
  • Input (cache hit): $0.10–$0.30
  • Output: $2.50–$3.00

For context, running a full benchmark suite on Kimi K2.5 costs roughly 4x less than Claude Opus 4.5 and nearly half of GPT-5.2.

Where to Access Kimi K2.5

One of the fastest ways to try Kimi K2.5 is on Overchat AI. You can start chatting with the model right now:

👉 Chat with Kimi K2.5 on Overchat AI

Kimi.com and the Kimi App

Kimi K2.5 is also available on Kimi[.]com and the Kimi mobile app, with four modes: K2.5 Instant, K2.5 Thinking, K2.5 Agent, and K2.5 Agent Swarm (Beta). Agent Swarm is currently available for high-tier paid users.

For Developers

Developers can access Kimi K2.5 through the Moonshot API or third-party providers like Together AI and Fireworks. The API is OpenAI-compatible, so you can swap it into existing workflows with minimal changes. The context window is 256K tokens.

Bottom Line

Kimi K2.5 is the strongest open-weights model available for agentic tasks. It achieved global SOTA on Humanity's Last Exam and BrowseComp, and matched proprietary models on coding benchmarks. The agent swarm capability — 100 sub-agents running 1,500 tool calls at the same time — is something no other model offers.

👉 Start chatting with Kimi K2.5

FAQs

What is Kimi K2.5?

Kimi K2.5 is Moonshot AI's flagship open-weights model, released on January 27, 2026. It features a 1 trillion parameter mixture-of-experts architecture (32B active), native multimodality with image and video understanding, and an agent swarm system that can orchestrate up to 100 sub-agents. It achieves state-of-the-art scores on agentic benchmarks like HLE (50.2%) and BrowseComp (74.9%).

What is the Kimi K2.5 release date?

Kimi K2.5 was officially released on January 27, 2026, though some users reported the model was silently rolled out to Kimi.com a few days earlier with improved fact-checking and vision capabilities.

Which is better, Kimi K2.5 vs Claude Opus 4.5?

It depends on the use case. For example, Kimi K2.5 outperforms Claude Opus 4.5 on agentic benchmarks, but for pure coding Claude still achieves higher scores (SWE-Bench Verified: 80.9% vs 76.8%). That being said, Kimi K2.5 also costs about 4x less.

Which is better, Kimi K2.5 vs GPT-5.2?

GPT-5.2 is still the best AI model for math, (it scores 100% on AIME 2025, Kimi scores 96.1%). For agentic workflows the view is quite different: Kimi K2.5 wins by 8% margin, which is significant. And again, it costs a fraction of GPT 5.2

Where can I access Kimi K2.5?

You can access Kimi K2.5 on Overchat AI right now. It's also available on Kimi.com, the Kimi mobile app, and through the Moonshot API for developers.

Is Kimi K2.5 free?

Yes, you can try Kimi K2.5 for free on Overchat AI. It also offers free tier access on Kimi[.]com with limited usage. What’s more, being an open-source model under a Modified MIT License, you can run it locally — but you’ll need very good hardware to launch it.