Claude Opus 4.6 is now on Overchat AI — Anthropic's Best Model Sets New Records

Introduction

Without further ado, let's dive in. Opus 4.6 is the new gold standard in coding models. On Terminal-Bench 2.0, it holds the top spot with 65.4%, and on Humanity's Last Exam, it holds 53.1%.

‍

Anthropic coding models are generally considered the best in the world, and this one is no exception. It introduces a new standard, even when used without the 1M context window, which is currently in beta and available only to API users.

‍

What is Claude Opus 4.6?

Claude Opus 4.6 is Anthropic's most advanced AI model, released on February 5, 2026 as an upgrade to Opus 4.5 (which launched in November 2025).

‍

The headline feature is the one-million-token context window (in beta). This represents a fivefold increase over the 200K limit of Opus 4.5, placing it on par with Google's Gemini 3 Pro. Previously, many users noted that Claude models excel at design and front-end coding. However, the limited context window caused them to lose context more easily. This will soon become a thing of the past with the release of the 1M window.

‍

Claude Opus 4.6 Main Features

An increased context window isn’t the only new feature of this model; there are other improvements as well. Most of these improvements only matter for enterprise use cases, but some are fundamental to how the model works. You'll surely notice these improvements when chatting with the model, so let's break them down.

‍

Adaptive Thinking — This replaces the old on/off toggle switch. Similar to ChatGPT, Claude will now either answer right away for easy tasks or activate reasoning for complex ones. In testing, I found this to be iffy — it sometimes switches on reasoning for things I’d consider very simple.

‍

Agent Teams — a powerful feature for enterprise and power users — multiple Claude instances can work in parallel on different parts of a project. Currently in preview.

‍

Context Compaction — This is a server-side summarization feature. When the context is about to be maxed out Claude edits the information it holds to make it smaller so that running tasks don’t stop.

‍

128K Max Output — The amount of text the model can output has been doubled.

‍

On Reddit, early reviews say Opus 4.6 is impressive when it comes to working with and creating legal documents. Its writing ability has improved as well, as it adds fewer predictive phrases and uses a broader vocabulary.

‍

In short, it's a big win in every category.

‍

Claude Opus 4.6 Benchmarks

Benchmark performance gives a rough sense of where Opus 4.6 sits, but the real-world improvements — better planning, self-correction, sustained focus — are what developers are reporting.

‍

Coding:

‍

Benchmark	Opus 4.6 Score
Terminal-Bench 2.0	65.4%
SWE-Bench Verified	80.8%
OSWorld (Computer Use)	72.7%
τ2-Bench Retail	91.9%
MCP Atlas	59.5%

‍

Reasoning and knowledge:

‍

Benchmark	Claude Opus 4.6
HLE (with tools)	53.1%
HLE (without tools)	40.0%
GDPval-AA	1606 Elo
BrowseComp	84.0%
ARC AGI 2	68.8%
BigLaw Bench	90.2%
Finance Agent	60.7%

‍

Long context retention (higher is better):

‍

Benchmark	Opus 4.6 Score	Vs Sonnet 4.5 Score
MRCR v2 (1M, 8-needle)	76%	18.5%
MRCR v2 (256K, 8-needle)	93%	10.8%

‍

Claude Opus 4.6 vs Other AI Models

Let’s see how the new model compares against other top models — both from Anthropic and competitors.

‍

Claude Opus 4.6 vs Opus 4.5

Opus 4.6 improves over Opus 4.5 on every benchmark other than the SWE-Bench Verified, where the two are essentially tied (80.8% vs 80.9%).

‍

Benchmark	Opus 4.6	Opus 4.5	Improvement
Terminal-Bench 2.0	65.4%	59.8%	+5.6pp
OSWorld	72.7%	66.3%	+6.4pp
ARC AGI 2	68.8%	37.6%	+31.2pp
GDPval-AA	1606 Elo	~1416 Elo	+190 Elo
Context Window	1M (beta)	200K	5x increase

‍

Claude Opus 4.6 vs GPT-5.2

Compared to Chat GPT-5.2, Opus 4.6 wins pretty much across the board, although it’s worth mentioning that ChatGPT model prices output tokens lower — $15/M vs $25/M.

‍

Benchmark	Opus 4.6	GPT-5.2	Winner
Terminal-Bench 2.0	65.4%	64.7%	Opus 4.6
GDPval-AA	1606 Elo	~1462 Elo	Opus 4.6
HLE (with tools)	53.1%	~42%	Opus 4.6
BrowseComp	84.0%	Lower	Opus 4.6
SWE-Bench Verified	80.8%	80.0%	Opus 4.6
MCP Atlas	59.5%	60.6%	GPT-5.2

‍

Claude Opus 4.6 vs Gemini 3 Pro

Gemini 3 Pro is the first model that beats Opus 4.6 in meaningful ways: specifically, when it comes to reasoning and a larger context window. But it is a less powerful AI coder.

‍

Benchmark	Opus 4.6	Gemini 3 Pro	Winner
Terminal-Bench 2.0	65.4%	56.2%	Opus 4.6
OSWorld	72.7%	Lower	Opus 4.6
GPQA Diamond	~85%	91.9%	Gemini 3 Pro
Context Window	1M (beta)	2M	Gemini 3 Pro

‍

Claude Opus 4.6 Pricing

Anthropic kept pricing identical to Opus 4.5, which is great, given the performance gains — and a bit surprising. Recently, we saw that prices climbed when more powerful models were introduced, but thankfully, that’s not the case here.

‍

Here’s everything you need to know about the cost of using Opus 4.6, starting with the API pricing, which is as follows:

‍

Token Type	Price per 1M Tokens
Input (standard)	$5.00
Input (cache read)	$0.50
Output	$25.00

‍

Next, here’s what long context pricing is like — these prices kick in above 200K tokens:

‍

Token Type	Price per 1M Tokens
Input	$10.00
Output	$37.50

‍

For a bit of context, Opus 4.6 is the most expensive model on this list. Here’s how it compares in terms of price against competitors:

‍

Model	Input / 1M	Output / 1M
Claude Opus 4.6	$5.00	$25.00
GPT-5.2	~$5.00	$15.00
Gemini 3 Pro	$2.00	$12.00

‍

If you want to chat with Claude Opus 4.6 without worrying about these API prices, you can head to Overchat AI and start chatting with the model as part of a single subscription, which also includes GPT 5.2, Kimi K2, all latest Gemini models, and more.

‍

Bottom Line

Claude Opus 4.6 is Anthropic's strongest model yet. The 1M context window, the auto compacting context, the adaptive reasoning — these features may not be game changing in isolation, but they compound, making for a model that feels better to work with, takes fewer shots at tasks, and performs even more consistently than its already very consistent predecessor.

‍

If you’re interested to test it for yourself, start chatting with Claude Opus 4.6 on Overchat AI today.