Introduction
Without further ado, let's dive in. Opus 4.6 is the new gold standard in coding models. On Terminal-Bench 2.0, it holds the top spot with 65.4%, and on Humanity's Last Exam, it holds 53.1%.
Anthropic coding models are generally considered the best in the world, and this one is no exception. It introduces a new standard, even when used without the 1M context window, which is currently in beta and available only to API users.
What is Claude Opus 4.6?
Claude Opus 4.6 is Anthropic's most advanced AI model, released on February 5, 2026 as an upgrade to Opus 4.5 (which launched in November 2025).
The headline feature is the one-million-token context window (in beta). This represents a fivefold increase over the 200K limit of Opus 4.5, placing it on par with Google's Gemini 3 Pro. Previously, many users noted that Claude models excel at design and front-end coding. However, the limited context window caused them to lose context more easily. This will soon become a thing of the past with the release of the 1M window.
Claude Opus 4.6 Main Features
An increased context window isn’t the only new feature of this model; there are other improvements as well. Most of these improvements only matter for enterprise use cases, but some are fundamental to how the model works. You'll surely notice these improvements when chatting with the model, so let's break them down.
Adaptive Thinking — This replaces the old on/off toggle switch. Similar to ChatGPT, Claude will now either answer right away for easy tasks or activate reasoning for complex ones. In testing, I found this to be iffy — it sometimes switches on reasoning for things I’d consider very simple.
Agent Teams — a powerful feature for enterprise and power users — multiple Claude instances can work in parallel on different parts of a project. Currently in preview.
Context Compaction — This is a server-side summarization feature. When the context is about to be maxed out Claude edits the information it holds to make it smaller so that running tasks don’t stop.
128K Max Output — The amount of text the model can output has been doubled.
On Reddit, early reviews say Opus 4.6 is impressive when it comes to working with and creating legal documents. Its writing ability has improved as well, as it adds fewer predictive phrases and uses a broader vocabulary.
In short, it's a big win in every category.
Claude Opus 4.6 Benchmarks
Benchmark performance gives a rough sense of where Opus 4.6 sits, but the real-world improvements — better planning, self-correction, sustained focus — are what developers are reporting.
Coding:
| Benchmark |
Opus 4.6 Score |
| Terminal-Bench 2.0 |
65.4% |
| SWE-Bench Verified |
80.8% |
| OSWorld (Computer Use) |
72.7% |
| τ2-Bench Retail |
91.9% |
| MCP Atlas |
59.5% |
Reasoning and knowledge:
| Benchmark |
Claude Opus 4.6 |
| HLE (with tools) |
53.1% |
| HLE (without tools) |
40.0% |
| GDPval-AA |
1606 Elo |
| BrowseComp |
84.0% |
| ARC AGI 2 |
68.8% |
| BigLaw Bench |
90.2% |
| Finance Agent |
60.7% |
Long context retention (higher is better):
| Benchmark |
Opus 4.6 Score |
Vs Sonnet 4.5 Score |
| MRCR v2 (1M, 8-needle) |
76% |
18.5% |
| MRCR v2 (256K, 8-needle) |
93% |
10.8% |
Claude Opus 4.6 vs Other AI Models
Let’s see how the new model compares against other top models — both from Anthropic and competitors.
Claude Opus 4.6 vs Opus 4.5
Opus 4.6 improves over Opus 4.5 on every benchmark other than the SWE-Bench Verified, where the two are essentially tied (80.8% vs 80.9%).
| Benchmark |
Opus 4.6 |
Opus 4.5 |
Improvement |
| Terminal-Bench 2.0 |
65.4% |
59.8% |
+5.6pp |
| OSWorld |
72.7% |
66.3% |
+6.4pp |
| ARC AGI 2 |
68.8% |
37.6% |
+31.2pp |
| GDPval-AA |
1606 Elo |
~1416 Elo |
+190 Elo |
| Context Window |
1M (beta) |
200K |
5x increase |
Claude Opus 4.6 vs GPT-5.2
Compared to Chat GPT-5.2, Opus 4.6 wins pretty much across the board, although it’s worth mentioning that ChatGPT model prices output tokens lower — $15/M vs $25/M.
| Benchmark |
Opus 4.6 |
GPT-5.2 |
Winner |
| Terminal-Bench 2.0 |
65.4% |
64.7% |
Opus 4.6 |
| GDPval-AA |
1606 Elo |
~1462 Elo |
Opus 4.6 |
| HLE (with tools) |
53.1% |
~42% |
Opus 4.6 |
| BrowseComp |
84.0% |
Lower |
Opus 4.6 |
| SWE-Bench Verified |
80.8% |
80.0% |
Opus 4.6 |
| MCP Atlas |
59.5% |
60.6% |
GPT-5.2 |
Claude Opus 4.6 vs Gemini 3 Pro
Gemini 3 Pro is the first model that beats Opus 4.6 in meaningful ways: specifically, when it comes to reasoning and a larger context window. But it is a less powerful AI coder.
| Benchmark |
Opus 4.6 |
Gemini 3 Pro |
Winner |
| Terminal-Bench 2.0 |
65.4% |
56.2% |
Opus 4.6 |
| OSWorld |
72.7% |
Lower |
Opus 4.6 |
| GPQA Diamond |
~85% |
91.9% |
Gemini 3 Pro |
| Context Window |
1M (beta) |
2M |
Gemini 3 Pro |
Claude Opus 4.6 Pricing
Anthropic kept pricing identical to Opus 4.5, which is great, given the performance gains — and a bit surprising. Recently, we saw that prices climbed when more powerful models were introduced, but thankfully, that’s not the case here.
Here’s everything you need to know about the cost of using Opus 4.6, starting with the API pricing, which is as follows:
| Token Type |
Price per 1M Tokens |
| Input (standard) |
$5.00 |
| Input (cache read) |
$0.50 |
| Output |
$25.00 |
Next, here’s what long context pricing is like — these prices kick in above 200K tokens:
| Token Type |
Price per 1M Tokens |
| Input |
$10.00 |
| Output |
$37.50 |
For a bit of context, Opus 4.6 is the most expensive model on this list. Here’s how it compares in terms of price against competitors:
| Model |
Input / 1M |
Output / 1M |
| Claude Opus 4.6 |
$5.00 |
$25.00 |
| GPT-5.2 |
~$5.00 |
$15.00 |
| Gemini 3 Pro |
$2.00 |
$12.00 |
If you want to chat with Claude Opus 4.6 without worrying about these API prices, you can head to Overchat AI and start chatting with the model as part of a single subscription, which also includes GPT 5.2, Kimi K2, all latest Gemini models, and more.
Bottom Line
Claude Opus 4.6 is Anthropic's strongest model yet. The 1M context window, the auto compacting context, the adaptive reasoning — these features may not be game changing in isolation, but they compound, making for a model that feels better to work with, takes fewer shots at tasks, and performs even more consistently than its already very consistent predecessor.
If you’re interested to test it for yourself, start chatting with Claude Opus 4.6 on Overchat AI today.