>_TheQuery
// Reading nowStart
← All Articles

Claude Sonnet 5 Finally Launched. The Rumors Were Wrong About Almost Everything Except That It Was Coming.

By Addy · June 30, 2026

Claude Sonnet 5 has been "launching next week" since February.

In early February, screenshots and reports circulated around a Google Vertex AI error log containing the identifier claude-sonnet-5@20260203 and the codename Fennec. A large part of the AI commentary cycle treated the identifier as proof that a generational launch was imminent. Two weeks later, Anthropic released Claude Sonnet 4.6, not Sonnet 5.

Anthropic never confirmed that the leaked identifier mapped directly to Sonnet 4.6. That is the point. The rumor cycle converted a staging artifact into a product roadmap, then treated the absence of the predicted launch as a delay.

The pattern repeated in June. Several outlets and social accounts predicted Sonnet 5 for the week of June 23, often recycling the Fennec codename and attaching estimated SWE-bench scores between 82% and 92%. None of those numbers came from Anthropic. The strongest claims were estimates layered on top of an old leak.

Sonnet 5 launched on June 30. The official benchmark numbers are real, and they are not 82% to 92% on SWE-bench Pro. The actual story is less dramatic than the rumor mill and more useful than most model launches this year, because the gap between what people expected and what shipped tells you something true about how Anthropic develops models.

Why It Took This Long

The honest answer is unglamorous: there is no public evidence that Sonnet 5 was a finished model Anthropic simply held back for five months.

Internal model identifiers are staging labels. Partner platforms can expose them through error messages, integration tests, and catalog changes well before public launch. A slug appearing in a log is evidence that development or integration work exists. It is not a release date, a final model name, or a guarantee that the checkpoint will ship unchanged.

This is a distinct story from Claude Fable 5. Fable 5 and Mythos 5 were restricted by a US government export-control process after they had already shipped. Sonnet 5's June 30 release was not publicly tied to an external directive. Axios reported that Anthropic's government discussions included Sonnet 5, but the company launched it broadly rather than through the limited access process applied to Mythos and GPT-5.6.

There is also no reliable evidence that Anthropic delayed Sonnet 5 to manage the optics of Fable 5. Fable is a policy and access story. Sonnet 5 is a model-development and product-positioning story. The fact that both unfolded in the same six-week window does not make one the cause of the other.

The lesson for anyone tracking AI releases through partner platform logs is simple: a slug in a catalog is a clue, not an announcement.

What Shipped

Claude Sonnet 5 is Anthropic's most agentic Sonnet model yet. It can plan, use browsers and terminals, call tools, and sustain autonomous work at a level that previously required more expensive Opus-class models. The positioning is specific: Sonnet 5 does not replace Claude Opus 4.8. It narrows the gap enough that price becomes the deciding factor for many production workloads.

The pricing supports that positioning. Sonnet 5 launches at USD 2 per million input tokens and USD 10 per million output tokens through August 31, 2026. Standard pricing then becomes USD 3 input and USD 15 output. Opus 4.8 costs USD 5 input and USD 25 output. Sonnet 5 therefore costs 40% of Opus during the introductory window and 60% of Opus at standard pricing.

There is a caveat hidden inside that clean comparison. Sonnet 5 uses a new tokenizer that Anthropic says can map the same text to roughly 1.0 to 1.35 times as many tokens depending on content. The introductory price is designed to keep migration from Sonnet 4.6 roughly cost-neutral. After August, teams should measure cost per completed task rather than assuming the nominal rate tells the whole story.

Sonnet 5 is now the default model for Claude Free and Pro plans and is available to Max, Team, and Enterprise users. It is live in Claude Code and through the Claude API. GitHub also made Sonnet 5 generally available in GitHub Copilot on launch day. Cloud-platform availability exists across Anthropic's ecosystem, though specific safeguards and regional rollouts can vary by provider.

The model ships with a 1 million token context window by default and supports 128,000 output tokens on standard requests. Anthropic's Message Batches API can raise the output ceiling to 300,000 tokens using its existing beta header. The reliable knowledge cutoff is January 2026.

Adaptive thinking is the recommended mode, and users can control effort to trade latency and token spend against capability. This is the effort-control system Anthropic developed across the recent Opus line, now turned into the default operating model for Sonnet.

The Benchmarks, Without the Rumor Inflation

The 82% to 92% SWE-bench range that circulated before launch did not materialize. The official Sonnet 5 system card reports a more modest and more legible set of results.

BenchmarkSonnet 4.6Sonnet 5GPT-5.5Gemini 3.5 Flash
SWE-bench Pro58.1%63.2%58.6%55.1%
Terminal-Bench 2.167.0%80.4%83.4%76.2%
BrowseComp, single agent76.2%84.7%84.4%Not reported
Humanity's Last Exam, no tools34.6%43.2%41.4%40.2%
Humanity's Last Exam, with tools46.8%57.4%52.2%Not reported
OSWorld-Verified78.5%81.2%78.7%78.4%
FrontierCode v115.1%38.8%25.5%Not reported
GDPval-AA v2, Elo1395161815091357

These are Anthropic's launch configurations. Claude results generally use adaptive thinking at maximum effort and are averaged over multiple trials. Competitor values come from published provider results or benchmark leaderboards, so the table is useful directionally rather than as a perfectly controlled laboratory comparison.

The headline coding number is 63.2% on SWE-bench Pro, up from Sonnet 4.6's 58.1%. GPT-5.5 scores 58.6% in Anthropic's comparison. Sonnet 5 therefore leads GPT-5.5 by 4.6 points on this benchmark while still trailing Opus 4.8's 69.2%.

The upgrade closes about 46% of the SWE-bench Pro gap between Sonnet 4.6 and Opus 4.8. That is meaningful, but it is not the near-total gap closure suggested by some launch-day summaries. The honest read is that Sonnet 5 moves the mid-tier materially closer to Opus without replacing it.

GPT-5.5 still leads on Terminal-Bench 2.1, 83.4% versus Sonnet 5's 80.4%. That is GPT-5.5's clearest advantage in Anthropic's own comparison. Sonnet 5 leads on OSWorld computer use, tool-assisted HLE, FrontierCode, and GDPval-AA v2.

Knowledge work is the most interesting result. Sonnet 5 reaches 1618 Elo on GDPval-AA v2. Anthropic's system card separately reports Opus 4.8 at 1615, making the two statistically tied and placing Sonnet 5 slightly higher on the point estimate. The cheaper model does not merely approach Opus on that evaluation. It matches it.

There is no official controlled comparison against GPT-5.6 because GPT-5.6 had not reached general availability at launch. Any Sonnet 5 versus GPT-5.6 comparison circulating online is unofficial and should be read accordingly.

The same caveat TheQuery has applied to every 2026 coding launch still matters: benchmark scores depend on the harness, tool permissions, effort level, token budget, and verifier. The system card is much more useful than the rumor numbers because it documents those conditions. It is not a substitute for testing the model on your own repository.

The Cybersecurity Posture, Read Correctly

Given the Fable 5 and GPT-5.6 government restrictions, Sonnet 5's safety framing deserves a precise reading rather than a connected-dots assumption.

Anthropic states directly that it did not deliberately train Sonnet 5 on cybersecurity tasks. The model can handle routine, non-harmful cyber work, but its performance on dangerous cybersecurity evaluations is substantially below Opus 4.8 and Mythos 5. In a patched Firefox exploit evaluation, neither Sonnet 4.6 nor Sonnet 5 produced a complete working exploit. Sonnet 5 showed a slightly higher rate of partial progress, which Anthropic attributes to broader intelligence gains rather than specialized cyber training.

Sonnet 5 ships with cyber guardrails enabled by default. They use the same general framework present in Opus 4.7 and 4.8, but are less strict than the safeguards introduced with Fable 5 because Anthropic assesses Sonnet 5's overall cyber risk as lower.

This is the opposite situation from Fable 5. Fable was restricted because of dangerous capability and concerns about bypassing its safeguards. Sonnet 5 carries lighter restrictions because Anthropic's evaluations found much less capability in the same domain. Reading both as expressions of the same gatekeeping pattern misses what each safeguard is responding to.

Beyond cybersecurity, Sonnet 5 performs better than Sonnet 4.6 on Anthropic's safety evaluations. It is more resistant to malicious requests and prompt injection, and it shows lower rates of hallucination and sycophancy. It still shows somewhat more misaligned behavior than Opus 4.8 and Mythos Preview on Anthropic's automated behavioral audit.

The distinction matters. Sonnet 5 is safer than the model it replaces. It is not the safest model Anthropic has evaluated.

What This Means for the Sonnet Tier

Sonnet 5's value proposition is not that it secretly became Opus. It is that a meaningful part of Opus-class agent performance moved into a cheaper default model.

During the introductory period, Sonnet 5 costs 40% as much as Opus 4.8 at list price. After August, it costs 60% as much. On SWE-bench Pro it remains about six points behind. On GDPval-AA v2 it is statistically tied. On OSWorld it leads GPT-5.5 and Gemini 3.5 Flash in Anthropic's comparison. On Terminal-Bench it remains behind GPT-5.5.

That mix makes Sonnet 5 the practical default for production teams that need strong coding, tool use, computer interaction, and knowledge work without routing every request to an Opus-tier model. Opus remains the escalation model for the hardest reasoning and coding tasks. Sonnet becomes the model that handles the bulk of the traffic.

The updated tokenizer complicates the price story enough that teams should benchmark total task cost before migrating. The model may use more tokens for the same text, but it may also finish complex tasks in fewer turns. Rate cards do not settle that question. Production traces do.

The rumor mill expected a dramatic replacement for the frontier. Anthropic shipped something more commercially important: a model that moves expensive agent capability into the default tier without pretending the premium tier no longer matters.

The rumors were wrong about the date, the benchmark range, and what the Fennec identifier proved. They were right about one thing.

Sonnet 5 was coming.

Sources:

Previously on TheQuery: The US Government Is Now Approving AI Models Before They Ship and Claude Fable 5 Is the Mythos Model You Can Actually Use