Claude OPUS 4.7 is here. Here is what actually changed

Same price. Better coding. 3x the vision. A new effort level. The full breakdown in one place

Ruben Dominguez

Apr 16, 2026

∙ Paid

Anthropic shipped Claude Opus 4.7 today

Same price as Opus 4.6. $5 per million input tokens, $25 output.

Available everywhere: claude.ai, the API, Bedrock, Vertex AI, Microsoft Foundry.

Four things changed in a meaningful way:

Benchmark comparison table showing Claude Opus 4.7 results across 14 categories versus Opus 4.6, GPT-5.4, Gemini 3.1 Pro, and Mythos Preview. Opus 4.7 column highlighted in orange. Agentic coding SWE-bench Pro: Opus 4.7 64.3%, Opus 4.6 53.4%, GPT-5.4 57.7%, Gemini 3.1 Pro 54.2%, Mythos 77.8%. SWE-bench Verified: 87.6% vs 80.8% vs 80.6% vs 93.9%. Terminal-Bench 2.0: 69.4% vs 65.4% vs 75.1% vs 68.5% vs 82.0%. BrowseComp: 79.3% vs 83.7% vs 89.3% vs 85.9% vs 86.9%. Finance Agent: 64.4% vs 60.1% vs 61.5% vs 59.7%. GPQA Diamond: 94.2% vs 91.3% vs 94.4% vs 94.3% vs 94.6%. CharXiv no tools: 82.1% vs 69.1% vs 86.1%. MMMLU: 91.5% vs 91.1% vs 92.6%. — Opus 4.7 across 14 benchmarks. Wins on coding, vision, and financial analysis. Loses on Terminal-Bench and BrowseComp. Mythos Preview still leads everywhere. Source: Anthropic, April 2026.

Speaking of inference costs: the Opus 4.7 migration raises a question most AI founders struggle to answer cleanly:

What is your cost per inference call in production?

Training costs, GPU bills, API spend: most teams track those. But the per-call unit economics of actually serving a product at scale? Usually fuzzy. That number is what separates companies that scale from ones that plateau.

Deploy by DigitalOcean is a free one-day conference in San Francisco on April 28 built entirely around answering it:

Character AI’s Chief Architect, Workato’s AI Research Lead, and CEOs from VAST Data, Arcee, and vLLM are all presenting real architectures, real cost data, and live demos of how they actually run inference in production.

If you are building AI products and this number keeps you up at night, this is the most useful afternoon you will spend this month 👇

save your Free spot !

The numbers first:

SWE-bench Verified: 87.6% vs 80.8% on Opus 4.6
SWE-bench Pro: 64.3% vs 53.4%
Computer use (OSWorld): 78.0% vs 72.7%
Visual reasoning (CharXiv): 82.1% vs 69.1%
Financial analysis: 64.4% vs 60.1%
GPQA Diamond: 94.2% vs 91.3%

On the aggregate, particularly for agentic and coding workloads where Claude has historically led, Opus 4.7 extends the gap rather than ceding ground.

One honest caveat: Terminal-Bench 2.0 is a regression. GPT-5.4 scores 75.1% there versus Opus 4.7’s 69.4%. BrowseComp also softens compared to Opus 4.6. Worth knowing if those workflows matter to you.

The 4 changes worth your attention:

1. Vision at 3x resolution

Vision at high resolution is a different model. Opus 4.7 at full resolution scores 79.5% on visual navigation without tools versus 57.7% for Opus 4.6. Source: Anthropic, April 2026.

Opus 4.7 accepts images up to 2,576 pixels on the long edge, roughly 3.75 megapixels. Opus 4.6 topped out at 1.15 megapixels. Screenshots, dense diagrams, design mockups, documents: all come through at actual fidelity now.

2. Instruction following is more literal

Where Opus 4.6 interpreted instructions loosely and sometimes skipped steps, Opus 4.7 takes them precisely. This is almost always a good thing. The practical implication: prompts written for older models occasionally produce unexpected results. Re-tune before switching production traffic.

3. A new xhigh effort level

Opus 4.7 improves modestly over Opus 4.6 on misaligned behavior. Mythos Preview remains the best-aligned model Anthropic has trained by a meaningful margin. Source: Anthropic, April 2026.

A new setting sits between high and max, giving finer control over the reasoning-latency tradeoff. Anthropic recommends starting with high or xhigh for coding and agentic use cases. Claude Code now defaults to xhigh for all plans.

4. Better file-based memory

Agents that write to and read from scratchpads or notes files across long sessions get noticeably more reliable behavior. Multi-session work that previously lost context now holds it.

The cyber safeguard context:

Opus 4.7 is the first Claude model shipping with automated detection and blocking for prohibited cybersecurity uses. This comes directly from last week’s Mythos Preview and Project Glasswing.

“We stated that we would keep Claude Mythos Preview’s release limited and test new cyber safeguards on less capable models first. Opus 4.7 is the first such model.”
Anthropic, April 16, 2026

Security professionals doing legitimate work can apply to Anthropic’s new Cyber Verification Program.

Line chart titled Agentic coding performance by effort level from Anthropic's internal autonomous agentic coding evaluation. X axis shows total tokens from 0 to 200k. Y axis shows score percentage from 0 to 80. Orange line for Opus 4.7 starts at 51% at low effort, rises through medium at 57%, high at 65%, xhigh at 71% at 100k tokens, and reaches max at 74% at 200k tokens. Blue line for Opus 4.6 starts at 39% at low, rises through medium at 48%, high at 54%, and peaks at max around 61% at 120k tokens. Opus 4.7 has a new xhigh data point between high and max that Opus 4.6 does not have. — At every effort level, Opus 4.7 outperforms Opus 4.6's equivalent. The new xhigh at 100k tokens scores 71%, already ahead of Opus 4.6's max at 200k tokens. Source: Anthropic, April 2026.

What is inside the full guide:

The complete 14-benchmark breakdown with the honest assessment of where Opus 4.7 wins and where it does not
The vision upgrade in full: what 3.75 megapixels unlocks for computer use agents, document extraction, and design workflows
The xhigh effort level explained: when to use it, how it interacts with task budgets, and the cost math
Task budgets in beta: the full setup guide with recommended token ceilings per task type
The /ultrareview command in Claude Code: what it flags, how to use the three free reviews Anthropic is offering at launch
The migration guide from Opus 4.6: the two breaking changes that affect token usage and how to retune prompts
The new tokenizer explained: why the same input produces up to 1.35x more tokens and how to manage it
Auto mode for Max users: what it does and when it saves you from interruptions on longer tasks

OPUS 4.7✨ COMPLETE GUIDE:

Keep reading with a 7-day free trial

Subscribe to The AI Corner to keep reading this post and get 7 days of free access to the full post archives.