/
Claude Opus 4.6 is now on Overchat AI — Anthropic's Best Model Sets New Records
Last Updated:
Apr 23, 2026

Claude Opus 4.6 is now on Overchat AI — Anthropic's Best Model Sets New Records

Anthropic released Claude Opus 4.6 on February 5, 2026, and it’s a major upgrade to its flagship model.

Among the updates are a 1 million token context window (up from 200,000), adaptive reasoning, which kicks in only when you need it, and general improvements when it comes to coding. Prices stay the same as before.

The model is already live on Overchat AI, and you can start chatting with Claude Opus 4.6 now.

But is this an incremental update, and how much does it improve real workflows? Let’s find out.

TLDR

  • Claude Opus 4.6 is Anthropic's most advanced model (released February 5, 2026), setting new highs on Terminal-Bench 2.0 (65.4%) and Humanity's Last Exam (53.1%).
  • The headline feature is a 1M token context window in beta — a 5x jump over Opus 4.5's 200K, putting it level with Gemini 3 Pro.
  • Adaptive Thinking replaces the old reasoning toggle — Claude now decides automatically when to reason vs answer directly, though early testing shows it's sometimes overcautious.
  • Agent Teams (preview) lets multiple Claude instances work in parallel on different parts of a project — useful for enterprise workflows.
  • Context Compaction automatically summarizes context server-side when it's close to maxing out, so long-running tasks don't stall.
  • Max output doubled to 128K tokens, and early users report big wins on legal documents, writing quality, and fewer predictive phrases.
  • vs Opus 4.5: improves on nearly every benchmark, essentially tied on SWE-Bench Verified (80.8% vs 80.9%).
  • vs GPT-5.2: Opus 4.6 wins across the board, though GPT-5.2 is cheaper on output ($15/M vs $25/M).
  • vs Gemini 3 Pro: Gemini leads on reasoning and context window, but Opus 4.6 is still the better coder.
  • Pricing is unchanged from Opus 4.5 despite the performance gains — a pleasant surprise given the recent trend of more powerful models costing more. It's still the most expensive model on the list.

Introduction

Without further ado, let's dive in. Opus 4.6 is the new gold standard in coding models. On Terminal-Bench 2.0, it holds the top spot with 65.4%, and on Humanity's Last Exam, it holds 53.1%.

Anthropic coding models are generally considered the best in the world, and this one is no exception. It introduces a new standard, even when used without the 1M context window, which is currently in beta and available only to API users.

What is Claude Opus 4.6?

Claude Opus 4.6 is Anthropic's most advanced AI model, released on February 5, 2026 as an upgrade to Opus 4.5 (which launched in November 2025).

The headline feature is the one-million-token context window (in beta). This represents a fivefold increase over the 200K limit of Opus 4.5, placing it on par with Google's Gemini 3 Pro. Previously, many users noted that Claude models excel at design and front-end coding. However, the limited context window caused them to lose context more easily. This will soon become a thing of the past with the release of the 1M window.

Claude Opus 4.6 Main Features

An increased context window isn’t the only new feature of this model; there are other improvements as well. Most of these improvements only matter for enterprise use cases, but some are fundamental to how the model works. You'll surely notice these improvements when chatting with the model, so let's break them down.

Adaptive Thinking — This replaces the old on/off toggle switch. Similar to ChatGPT, Claude will now either answer right away for easy tasks or activate reasoning for complex ones. In testing, I found this to be iffy — it sometimes switches on reasoning for things I’d consider very simple.

Agent Teams — a powerful feature for enterprise and power users — multiple Claude instances can work in parallel on different parts of a project. Currently in preview.

Context Compaction — This is a server-side summarization feature. When the context is about to be maxed out Claude edits the information it holds to make it smaller so that running tasks don’t stop.

128K Max Output — The amount of text the model can output has been doubled.

On Reddit, early reviews say Opus 4.6 is impressive when it comes to working with and creating legal documents. Its writing ability has improved as well, as it adds fewer predictive phrases and uses a broader vocabulary.

In short, it's a big win in every category.

Claude Opus 4.6 Benchmarks

Benchmark performance gives a rough sense of where Opus 4.6 sits, but the real-world improvements — better planning, self-correction, sustained focus — are what developers are reporting.

Coding:

Benchmark Opus 4.6 Score
Terminal-Bench 2.0 65.4%
SWE-Bench Verified 80.8%
OSWorld (Computer Use) 72.7%
τ2-Bench Retail 91.9%
MCP Atlas 59.5%

Reasoning and knowledge:

Benchmark Claude Opus 4.6
HLE (with tools) 53.1%
HLE (without tools) 40.0%
GDPval-AA 1606 Elo
BrowseComp 84.0%
ARC AGI 2 68.8%
BigLaw Bench 90.2%
Finance Agent 60.7%

Long context retention (higher is better):

Benchmark Opus 4.6 Score Vs Sonnet 4.5 Score
MRCR v2 (1M, 8-needle) 76% 18.5%
MRCR v2 (256K, 8-needle) 93% 10.8%

Claude Opus 4.6 vs Other AI Models

Let’s see how the new model compares against other top models — both from Anthropic and competitors.

Claude Opus 4.6 vs Opus 4.5

Opus 4.6 improves over Opus 4.5 on every benchmark other than the SWE-Bench Verified, where the two are essentially tied (80.8% vs 80.9%).

Benchmark Opus 4.6 Opus 4.5 Improvement
Terminal-Bench 2.0 65.4% 59.8% +5.6pp
OSWorld 72.7% 66.3% +6.4pp
ARC AGI 2 68.8% 37.6% +31.2pp
GDPval-AA 1606 Elo ~1416 Elo +190 Elo
Context Window 1M (beta) 200K 5x increase

Claude Opus 4.6 vs GPT-5.2

Compared to Chat GPT-5.2, Opus 4.6 wins pretty much across the board, although it’s worth mentioning that ChatGPT model prices output tokens lower — $15/M vs $25/M.

Benchmark Opus 4.6 GPT-5.2 Winner
Terminal-Bench 2.0 65.4% 64.7% Opus 4.6
GDPval-AA 1606 Elo ~1462 Elo Opus 4.6
HLE (with tools) 53.1% ~42% Opus 4.6
BrowseComp 84.0% Lower Opus 4.6
SWE-Bench Verified 80.8% 80.0% Opus 4.6
MCP Atlas 59.5% 60.6% GPT-5.2

Claude Opus 4.6 vs Gemini 3 Pro

Gemini 3 Pro is the first model that beats Opus 4.6 in meaningful ways: specifically, when it comes to reasoning and a larger context window. But it is a less powerful AI coder.

Benchmark Opus 4.6 Gemini 3 Pro Winner
Terminal-Bench 2.0 65.4% 56.2% Opus 4.6
OSWorld 72.7% Lower Opus 4.6
GPQA Diamond ~85% 91.9% Gemini 3 Pro
Context Window 1M (beta) 2M Gemini 3 Pro

Claude Opus 4.6 Pricing

Anthropic kept pricing identical to Opus 4.5, which is great, given the performance gains — and a bit surprising. Recently, we saw that prices climbed when more powerful models were introduced, but thankfully, that’s not the case here.

Here’s everything you need to know about the cost of using Opus 4.6, starting with the API pricing, which is as follows:

Token Type Price per 1M Tokens
Input (standard) $5.00
Input (cache read) $0.50
Output $25.00

Next, here’s what long context pricing is like — these prices kick in above 200K tokens:

Token Type Price per 1M Tokens
Input $10.00
Output $37.50

For a bit of context, Opus 4.6 is the most expensive model on this list. Here’s how it compares in terms of price against competitors:

Model Input / 1M Output / 1M
Claude Opus 4.6 $5.00 $25.00
GPT-5.2 ~$5.00 $15.00
Gemini 3 Pro $2.00 $12.00

If you want to chat with Claude Opus 4.6 without worrying about these API prices, you can head to Overchat AI and start chatting with the model as part of a single subscription, which also includes GPT 5.2, Kimi K2, all latest Gemini models, and more.

Bottom Line

Claude Opus 4.6 is Anthropic's strongest model yet. The 1M context window, the auto compacting context, the adaptive reasoning — these features may not be game changing in isolation, but they compound, making for a model that feels better to work with, takes fewer shots at tasks, and performs even more consistently than its already very consistent predecessor.

If you’re interested to test it for yourself, start chatting with Claude Opus 4.6 on Overchat AI today.