>_TheQuery
← Glossary

Claude Opus 4.6

LLM Models

Anthropic's February 2026 flagship with 1M context window, 80.8% on SWE-bench, 68.8% on ARC-AGI-2, and the highest Terminal-Bench 2.0 score among all frontier models.

Claude Opus 4.6, released by Anthropic on February 5, 2026, is the company's most capable model ever, featuring a 1 million token context window (beta) and 128K max output tokens. It introduces adaptive thinking and achieves the highest agentic coding scores Anthropic has produced to date. The model plans more carefully, sustains agentic tasks for longer, operates more reliably in larger codebases, and has improved code review and debugging skills to catch its own mistakes.

Opus 4.6 leads all frontier models on Terminal-Bench 2.0 (65.4%) for agentic coding and Humanity's Last Exam (53.1% with tools, 40.0% without) for complex multidisciplinary reasoning. It scores 80.8% on SWE-bench Verified, 68.8% on ARC-AGI-2 (up from 37.6% for Opus 4.5, the largest single-generation leap on this benchmark), and 72.7% on OSWorld-Verified for computer use. On GDPval-AA (knowledge work in finance, legal, and other domains), Opus 4.6 outperforms GPT-5.2 by 144 Elo points and its predecessor Opus 4.5 by 190 points. On MRCR v2 (8-needle) long-context retrieval, it scores 93% at 256K tokens and 76% at 1M tokens.

Priced at $5 per million input tokens and $25 per million output tokens (unchanged from Opus 4.5), the model is available on claude.ai, the Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI. Anthropic states Opus 4.6 shows an overall safety profile as good as or better than any other frontier model in the industry.

Last updated: February 23, 2026