O que é Claude Opus 4.6?
Anthropic shipped Opus 4.8 on May 28, 2026, six weeks after Opus 4.7 — the fastest cadence between flagship Opus releases yet. The headline isn't a single benchmark; it's that 4.8 is, by Anthropic's own measure, four times less likely than 4.7 to let a flawed line of code slip through review. For teams letting Claude write production PRs, that's the number that matters most.
4.8 sits one tier below Claude Mythos, the larger model Anthropic has been previewing inside cybersecurity organizations and plans to roll out broadly in the coming weeks. What's notable is that 4.8's alignment scores already match Mythos Preview — meaning the safety upgrade landed before the raw capability one did.
What changed under the hood
Five effort levels, not one toggle. Opus 4.7 had a single extended-thinking switch. 4.8 exposes Low, Medium, High (default), xHigh, and Max — a real dial. Low buys you a snappy chat reply; Max pushes the model into a multi-minute reasoning pass that reaches further than xHigh on 4.7 by a measurable margin on the hardest problems. You spend tokens where you actually need them, not on every reply.
Dynamic workflows in Claude Code. A single Opus 4.8 agent can now spawn hundreds of subagents in parallel, each with its own context and its own task. For a monorepo refactor, a security audit across a hundred files, or a research fan-out, the wall-clock difference vs the serial 4.7 flow is often the difference between "useful" and "shippable today." The orchestrator stays one model; the subagents do the work.
Fast mode at one-third the previous price. 4.8 introduces a Fast mode that runs roughly 2.5× faster than the standard endpoint at $10 input / $50 output per million tokens — three times cheaper than the prior generation's Fast tier. Standard pricing stays at $5 / $25, unchanged from 4.7. For chat-style use where a user is waiting on the reply, Fast mode is now the default to reach for.
Where 4.8 pulls ahead. On SWE-bench Pro, 4.8 lands at 69.2% — up from 64.3% on 4.7, and 10+ points clear of GPT-5.5 (58.6%) and Gemini 3.1 Pro (54.2%). Computer use on OSWorld-Verified ticks up to 83.4%. The GDPval-AA Elo for economic knowledge work jumps from 1753 to 1890, a bigger generational delta than 4.7 had over 4.6. On Humanity's Last Exam with tools it scores 57.9%, beating GPT-5.5 (52.2%) and Gemini 3.1 Pro (51.4%) by a comfortable margin.
Where it doesn't. The clearest loss is Terminal-Bench 2.1, where 4.8 scores 74.6% versus GPT-5.5's 78.2%. If your workflow is mostly raw terminal coding with no planning and few tools, GPT-5.5 still has the edge there. On most other agentic benchmarks 4.8 is back in front — but it's worth knowing the gap honestly rather than glossing it.
What to actually use it for. For agentic engineering, Claude Code dynamic workflows, computer-use automations, financial analysis (Finance Agent v2: 53.9%), and any task where you'd rather the model say "I'm not sure" than confidently bluff, 4.8 is currently the strongest choice on the market. For high-volume chat where latency matters more than the last few percentage points, Fast mode at one-third the prior cost makes it viable for production workloads it wasn't quite right for before. On Overchat AI, you can start chatting with Opus 4.8 immediately after creating a free account — no API key required.
How Opus 4.8 stacks up against the field
Against Opus 4.7. The numbers favour 4.8 across nearly every workload Anthropic publishes. SWE-bench Pro moves from 64.3% to 69.2%, OSWorld-Verified from 82.8% to 83.4%, Humanity's Last Exam (no tools) from 46.9% to 49.8%, Finance Agent v2 from 51.5% to 53.9%, and GDPval-AA from 1753 to 1890. The real qualitative shift, though, is the honesty work: where 4.7 would sometimes write code that looked right and let a subtle flaw through, 4.8 flags uncertainty four times more often. Effort control and Fast mode are net-new — 4.7 didn't have them. If you're already on 4.7, the upgrade is more or less free (same standard pricing) and pays back fastest on agent-driven code review.
Against GPT-5.5 and Gemini 3.1 Pro. On the agentic benchmarks that decide most Claude Code and computer-use workloads, 4.8 has a real lead: SWE-bench Pro is 10.6 points clear of GPT-5.5 and 15 ahead of Gemini 3.1 Pro; OSWorld-Verified sits 4.7 points above GPT-5.5; Humanity's Last Exam with tools leads by 5.7 over GPT-5.5. GDPval-AA economic Elo for Opus 4.8 is 1890 against GPT-5.5's 1769 and Gemini 3.1 Pro's 1314. The honest exception is Terminal-Bench 2.1, where GPT-5.5 (78.2%) still beats 4.8 (74.6%) by 3.6 points — if your stack is mostly bash-driven CI work without planning, that gap is worth knowing. On pricing, 4.8 standard is $5 / $25 per million input/output tokens; GPT-5.5 remains cheaper at the entry tier, but 4.8's new Fast mode at $10 / $50 closes most of that gap for latency-sensitive workloads.
Where Opus 4.8 fits in the lineup. Opus is Anthropic's flagship tier, sitting above Claude Sonnet (the balanced general-purpose model) and Claude Haiku (fast, high-volume). Above Opus 4.8, the larger Claude Mythos model is currently in a controlled preview inside cybersecurity organizations and is expected to land broadly in the coming weeks. All three Claude tiers are available on Overchat AI, so you can match the model to the task without juggling subscriptions.











