/
Claude Opus 4.7 is Now on Overchat AI — Anthropic's Most Advanced AI Model
Last Updated:
Apr 17, 2026

Claude Opus 4.7 is Now on Overchat AI — Anthropic's Most Advanced AI Model

Anthropic released Claude Opus 4.7 on April 16, 2026, and it's a direct upgrade to Opus 4.6 that pushes the model's lead in coding and agentic work further than expected.

Among the updates are a new xhigh effort level, substantially better vision, and measurable gains on AI coding benchmarks. The new model is already live on Overchat AI, and you can start chatting with Claude Opus 4.7.

TLDR

  • Claude Opus 4.7 launched on April 16, 2026 as a direct upgrade to Opus 4.6 and is Anthropic's new flagship generally available model.
  • Pricing is unchanged from Opus 4.6 at $5 per million input tokens and $25 per million output tokens, but the model uses a new tokenizer that maps the same text to up to 35% more tokens, leading to higher real-world costs in some cases.
  • Opus 4.7 scores 64.3% on SWE-bench Pro, compared to 53.4% for Opus 4.6 and 57.7% for GPT-5.4.
  • Vision is much improved — the new model processes images at more than three times the previous resolution, up to ~3.75 MP.
  • Instruction following became noticeably more literal compared to Opus 4.6, meaning the model will follow prompts more accurately across general work.

What is Claude Opus 4.7?

Released on April 16, 2026, Claude Opus 4.7 is Anthropic's latest flagship model, positioned as an advanced software engineering and long-term planning agent. Opus 4.6 was widely regarded as one of the best AI coding models, and Opus 4.7 builds on that capability.

One of the most important benchmarks for measuring coding performance is SWE-bench Pro, where Opus 4.7 achieved a score of 64.3% — up from the 53.4% scored by Opus 4.6. For comparison, GPT-5.4 scored 57.7%. This benchmark measures an AI agent's ability to solve real-world engineering problems, many of which are based on challenging GitHub issues.

Opus 4.7 Features

While most features carry over from the previous generation, including the 1M context window, there are three key areas where this model improves on its predecessor.

More accurate instruction following

Opus 4.7 interprets prompts more literally. While this is an improvement in capability, Anthropic explicitly warns that prompts tuned for Opus 4.6 — which sometimes interpreted instructions loosely or skipped parts — may now produce unexpected outputs.

Better vision

Opus 4.7 can process images with up to 2,576 pixels on the long edge, which equates to approximately 3.75 megapixels — more than three times what earlier Claude models could handle. This is a model-level change, not an API parameter, so it applies automatically. In practice, the model can understand images far more accurately, and the improvement is significant. On XBOW's visual acuity benchmark, used for autonomous penetration testing, Opus 4.7 scored 98.5% compared to 54.5% for Opus 4.6.

More accurate long-term memory

Opus 4.7 is better at reading and writing notes across long, multi-session work, so follow-up tasks require less context loaded up front.

Beyond the core upgrades, there are several features worth knowing about — a mix of model behaviors and platform additions that shipped alongside the release.

xhigh effort level

Previously, developers had low, medium, high, and max. Opus 4.7 adds xhigh between high and max, giving finer control over the reasoning-vs-latency tradeoff. Anthropic recommends starting at high or xhigh for coding tasks.

Task budgets

Developers can now cap Claude's token spend per task, letting the model prioritize across longer runs.

Auto Mode for Max users

Previously limited to Teams/Enterprise/API, Auto Mode in Claude Code now lets Claude make permission decisions on its own during long tasks.

Self-verification behavior

Opus 4.7 designs ways to verify its own outputs before reporting back — something many early testers have noticed. For example, the Vercel team tested the model before release and reported that it "does proofs on systems code before starting work," which is new behavior. In practice, this means fewer confident-but-wrong outputs.

Opus 4.7 is more opinionated

Opus 4.7 now pushes back in discussions rather than simply agreeing with the user — like a good coworker should. Worth noting if you've been frustrated by sycophantic AI responses.

Claude Opus 4.7 Benchmarks

Note: Anthropic flags that SWE-bench Verified, Pro, and Multilingual contain a subset of problems that show signs of memorization — they've confirmed that Opus 4.7's margin over 4.6 holds even when those are excluded.

Coding:

SWE-bench Pro 64.3%
Rakuten-SWE-Bench 3× more production tasks resolved than Opus 4.6

Real-world knowledge:

BigLaw Bench (Harvey, high effort) 90.9%
Databricks OfficeQA Pro 21% fewer errors than Opus 4.6

Vision:

XBOW Visual Acuity 98.5%
54.5% Max image resolution
~3.75 MP (2,576 px long edge)

Claude Opus 4.7 vs Other AI Models

Let's see how Opus 4.7 stacks up against the competition — both Anthropic's own models and the main rivals.

Claude Opus 4.7 vs Opus 4.6

Not every workload saw dramatic gains, but improvements are most significant on difficult tasks, making Opus 4.7 a strong upgrade over 4.6.

Benchmark Opus 4.7 Opus 4.6 Improvement
SWE-bench Pro 64.3% 53.4% +10.9pp
CursorBench ~70% ~58% +12pp
XBOW Visual Acuity 98.5% 54.5% +44pp

Claude Opus 4.7 vs GPT-5.4

GPT-5.4 is OpenAI's current top-tier model. Anthropic's own comparison chart places Opus 4.7 ahead of GPT-5.4 on most benchmarks, with SWE-bench Pro showing the clearest margin.

Benchmark Opus 4.7 GPT-5.4 Winner
SWE-bench Pro 64.3% 57.7% Opus 4.7

Claude Opus 4.7 vs Claude Mythos

Claude Mythos is Anthropic's internal most capable model and a possible Opus 5 variant which, reportedly, is so advanced that releasing it publicly poses a risk to global cybersecurity — it was able to find multiple old bugs in software like Firefox that enabled malicious scripts to escape the execution environment and take complete control of a PC in just a couple of hours. This was done in controlled testing, but Anthropic decided not to release the model until they can ensure it's safe to do so.

To be clear, Opus 4.7 is less advanced than Mythos. For instance, on SWE-bench Pro, Mythos Preview scores 77.8% vs. Opus 4.7's 64.3%.

How to access Claude Mythos? Right now, Mythos is only available to a limited set of partners under Project Glasswing — a group of enterprises given early access to use the model to find and fix vulnerabilities in the internet's most critical software before the model is released publicly.

Claude Opus 4.7 Pricing

Per-token pricing is unchanged from Opus 4.6.

Token Type Price per 1M Tokens
Input (standard) $5.00
Input (cache read) $0.50
Output $25.00

But there's a catch. Opus 4.7 ships with a new tokenizer that maps the same text to roughly 1.0–1.35× more tokens than Opus 4.6, depending on content type. On top of that, Opus 4.7 thinks more at higher effort levels, particularly on later turns in agentic settings — meaning more output tokens.

In other words, the sticker price is the same, but the actual per-request cost on Opus 4.7 will be higher. It's best to measure the difference on real tasks to understand how the new tokenizer will affect cost for your particular workloads.

To offset the potentially higher cost, Opus supports prompt caching that delivers up to 90% savings.

Here's how Opus 4.7 compares against competitors on price:

Model Input / 1M Output / 1M
Claude Opus 4.7 $5.00 $25.00
GPT-5.4 ~$5.00 ~$15.00
Gemini 3.1 Pro $2.00 $12.00

Opus 4.7 is the most expensive of these flagship models. As before, if you want to skip the API math entirely, you can chat with Claude Opus 4.7 on Overchat AI as part of a single subscription that also includes GPT-5.4, Gemini 3.1 Pro, Kimi K2.5, and more.

How to migrate from Opus 4.6 to Opus 4.7

If you're upgrading an existing workflow, there are three things worth planning for:

  • Re-tune your prompts. Because Opus 4.7 follows instructions more literally, prompts that worked on Opus 4.6 may produce unexpected results.
  • Measure token usage. Between the new tokenizer and increased output at high effort, the model will most likely cost more for the same tasks — run a cost comparison to see how this affects you.
  • Start at xhigh effort. Anthropic recommends this specifically for coding, and Claude Code already defaults to xhigh.

Bottom Line

Claude Opus 4.7 is a significant upgrade over Opus 4.6, particularly for AI coding. If you want to test Opus 4.7 on your own workflows today, head to Overchat AI and start chatting with Claude Opus 4.7.

Key Takeaways:

  • Opus 4.7 is a direct upgrade to Opus 4.6, released on April 16, 2026. It is Anthropic's new flagship generally available model.
  • The biggest gains are in coding and agentic work. On SWE-bench Pro, Opus 4.7 scores 64.3% vs. 53.4% for Opus 4.6 and 57.7% for GPT-5.4. CursorBench jumps from ~58% to ~70%.
  • Instruction following is more literal. Opus 4.7 follows prompts more exactly than Opus 4.6 — meaning fewer unexpected deviations and less unneeded creativity, but also meaning that prompts tuned for Opus 4.6 may not work as well with the new model.
  • Vision is dramatically better. Images are now processed at up to 2,576 pixels on the long edge — more than 3× the previous limit. In practice, visual capability has nearly doubled, as shown by XBOW's visual-acuity benchmark jumping from 54.5% to 98.5%.
  • A new xhigh effort level sits between high and max, giving finer control over the reasoning-vs-latency tradeoff.
  • The tokenizer changed. Despite identical per-token rates, the new tokenizer maps the same text to 1.0–1.35× more tokens, and the model thinks more at higher effort levels — which means that in practice Claude Opus 4.7 is more expensive to run than Claude Opus 4.6.
  • Claude Opus 4.7 is more reliable. Notion has reported that Opus 4.7 completes tasks 14% more reliably, and Hex noted a new behavior: the model pushes back more often and flags when data it needs is missing instead of fabricating plausible fallbacks.

Opus 4.7 is the top AI coding model on benchmarks that measure real-world task completion, such as SWE-bench Pro. That said, Gemini 3.1 Pro offers a larger context window (2M), and both Gemini 3.1 Pro and GPT-5.4 come in at lower API pricing.