>_TheQuery
← Glossary

Claude Sonnet 5

Models

Anthropic's June 2026 agentic Sonnet model, combining a 1M-token context window, near-Opus performance, and lower production pricing.

A highly capable staff engineer assigned to the everyday queue: cheaper than the specialist architect, strong enough to finish most jobs, and smart enough to escalate the few that need deeper expertise.

Claude Sonnet 5 is Anthropic's June 2026 mid-tier model for agentic coding, tool use, computer interaction, and professional knowledge work. Anthropic positions it as the most agentic Sonnet release yet: substantially stronger than Claude Sonnet 4.6 and close enough to Claude Opus 4.8 that cost, latency, and workload shape become the main routing decisions.

Core profile

Sonnet 5 is the default model for Claude Free and Pro plans and is also available to Max, Team, and Enterprise users. Developers can use it in Claude Code or through the Claude API with the model identifier claude-sonnet-5. GitHub Copilot added the model on launch day, and Anthropic supports it across its broader cloud platform ecosystem.

The model has a 1 million token context window by default and supports up to 128,000 output tokens on standard requests. Anthropic's Message Batches API can raise the output ceiling to 300,000 tokens using the existing output beta. Adaptive thinking is the recommended operating mode, with effort controls that trade token use and latency against capability.

Pricing starts at USD 2 per million input tokens and USD 10 per million output tokens through August 31, 2026. Standard pricing then becomes USD 3 input and USD 15 output. That makes Sonnet 5 40% of Opus 4.8's list price during the introductory window and 60% at standard pricing.

The tokenizer caveat

Sonnet 5 uses a new tokenizer. Anthropic says the same text can map to roughly 1.0 to 1.35 times more tokens than it did with Sonnet 4.6, depending on the content type. The introductory pricing is designed to make migration roughly cost-neutral, but teams should still measure cost per completed task rather than comparing rate cards alone.

The model may consume more tokens for the same input while needing fewer turns to finish a complex job. Production traces, not nominal token prices, determine which effect wins.

Official benchmark profile

Anthropic's Sonnet 5 system card reports the following launch results. Claude scores generally use adaptive thinking at maximum effort and are averaged over multiple trials. Competitor scores come from published provider results or benchmark leaderboards, so the comparison is directional rather than perfectly controlled.

BenchmarkSonnet 4.6Sonnet 5GPT-5.5Gemini 3.5 Flash
SWE-bench Pro58.1%63.2%58.6%55.1%
Terminal-Bench 2.167.0%80.4%83.4%76.2%
BrowseComp, single agent76.2%84.7%84.4%Not reported
Humanity's Last Exam, no tools34.6%43.2%41.4%40.2%
Humanity's Last Exam, with tools46.8%57.4%52.2%Not reported
OSWorld-Verified78.5%81.2%78.7%78.4%
FrontierCode v115.1%38.8%25.5%Not reported
GDPval-AA v2, Elo1395161815091357

The benchmark pattern is mixed in a useful way. Sonnet 5 beats GPT-5.5 on SWE-bench Pro, BrowseComp, tool-assisted HLE, OSWorld, FrontierCode, and GDPval-AA v2. GPT-5.5 retains a Terminal-Bench lead. Sonnet 5 remains below Opus 4.8 on SWE-bench Pro but is statistically tied with it on GDPval-AA v2, where Sonnet 5 scores 1618 Elo and Opus 4.8 scores 1615.

How it compares with Sonnet 4.6 and Opus 4.8

Sonnet 5 is a clear upgrade over Sonnet 4.6. The largest practical gains are in terminal work, long-running coding, tool use, computer interaction, and following a complex task through to completion rather than stopping after the first successful-looking step.

Against Opus 4.8, the model is more selective. Sonnet 5 costs substantially less and matches Opus on some knowledge-work and agentic tasks. Opus remains the stronger escalation model for the hardest reasoning, coding, and high-risk review tasks. A sensible production system routes routine and moderately difficult work to Sonnet, then escalates failures or high-value decisions to Opus.

Safety and cybersecurity

Anthropic says it did not deliberately train Sonnet 5 on cybersecurity tasks. In an exploit-development evaluation using patched Firefox vulnerabilities, Sonnet 5 never produced a complete working exploit, though it showed more partial progress than Sonnet 4.6. Its dangerous cyber capability remains substantially below Opus 4.8 and Mythos 5.

Sonnet 5 still ships with cyber safeguards enabled by default. These controls use the same general framework as Opus 4.7 and 4.8 but are less restrictive than Fable 5's safeguards because Anthropic assesses Sonnet 5's overall cyber risk as lower.

Across broader safety evaluations, Sonnet 5 is more resistant to malicious requests and prompt-injection hijacking than Sonnet 4.6. It also shows lower hallucination and sycophancy rates. However, Anthropic reports somewhat more misaligned behavior than Opus 4.8 and Mythos Preview on its automated behavioral audit.

What it is good at

Claude Sonnet 5 is best suited for production workloads where a cheaper model needs to behave like an agent rather than a chatbot: repository-scale coding, multi-file debugging, browser and terminal workflows, computer-use automation, research, legal or financial analysis, document-heavy knowledge work, and tool-using assistants that run across many steps.

It is especially attractive as the default model in a routing system. Sonnet handles the bulk of work at lower cost, while Opus or Fable can be reserved for tasks where failure is expensive enough to justify premium pricing.

Tradeoffs

Sonnet 5 is not a full replacement for Opus 4.8. It remains weaker on some difficult coding and pure-reasoning evaluations, and GPT-5.5 still leads on Terminal-Bench 2.1 in Anthropic's comparison.

The updated tokenizer can also make naive cost comparisons misleading. Teams migrating from Sonnet 4.6 should retune max-token limits and monitor actual token consumption. And as with every current coding model, launch benchmark scores should be checked against internal repositories because harness choice, effort level, tool permissions, and verifier behavior can move the result significantly.

The practical takeaway is straightforward: Sonnet 5 is the everyday agent model in Anthropic's lineup. It brings much of the recent Opus capability curve into a cheaper tier without making the premium models obsolete.

Last updated: July 1, 2026