GPT-5.5

Models

An OpenAI frontier model designed for complex real-world work, with strong benchmark performance in agentic coding, computer use, and professional knowledge tasks.

Think of it like bringing in a senior specialist instead of a generalist: you do it for the hardest calls, not for every routine task.

GPT-5.5 is an OpenAI frontier model built for complex, execution-heavy work: coding, online research, document creation, spreadsheet work, and multi-step tasks that span tools. OpenAI positions it as a model that understands goals earlier, uses tools more effectively, checks its own work, and carries tasks further before handing control back to the user.

Why it matters

GPT-5.5 represents a shift from models that merely answer well to models that can keep working through a job. That makes it especially relevant for agentic systems, coding agents, and professional workflows where persistence, context handling, and tool use matter as much as raw reasoning quality.

Official benchmark profile

At launch, OpenAI reported several strong benchmark results for GPT-5.5:

Terminal-Bench 2.0: 82.7%, which OpenAI presented as state-of-the-art for complex terminal workflows
SWE-Bench Pro: 58.6% on real-world GitHub issue resolution
GDPval: 84.9% on professional knowledge work across occupations
OSWorld-Verified: 78.7% for computer-use tasks in real software environments
BrowseComp: 84.4% on tool-assisted browsing and web research

OpenAI's public comparison table also showed GPT-5.5 ahead of GPT-5.4 on these evaluations while using fewer tokens on coding tasks.

Practical tradeoffs

At launch, OpenAI said GPT-5.5 would come to the API with a 1 million token context window and pricing of $5 per million input tokens and $30 per million output tokens. That places it in the premium tier: powerful enough for difficult work, but expensive enough that teams usually reserve it for high-value steps like planning, escalation, code review, or long-horizon agent tasks.

In practice, GPT-5.5 is the kind of model you use when a cheaper model is likely to fail quietly: subtle bugs, fragile tool use, or incomplete multi-step execution. Its value is less about sounding smarter in chat and more about finishing hard work with fewer interventions.

Last updated: April 30, 2026

GPT-5.5

Why it matters

Official benchmark profile

Practical tradeoffs

Related Terms