Qwen 3.7

LLM Models

Alibaba's May 2026 Qwen generation led by the closed-weight Qwen3.7-Max agent model, with preview Plus and expected mid-tier variants forming the broader family.

A flagship workshop with locked doors: the best tools are inside and ready for enterprise work, but developers can use them only through Alibaba Cloud rather than taking the machines home.

Qwen 3.7 is Alibaba's May 2026 generation of Qwen models, formally led by Qwen3.7-Max, a proprietary, API-only model designed for agentic coding, office automation, long-horizon tool use, and one-million-token context workflows. Unlike the open-weight Qwen 3 and Qwen 3.5 releases, the first Qwen 3.7 flagship release is not available as downloadable weights on Hugging Face, GGUF, Ollama, or a self-hosted checkpoint.

The Qwen 3.7 family is best understood as a hosted flagship line rather than a fully open model drop. As of May 26, 2026, the public family includes:

Model	Status	Role	Notes
Qwen3.7-Max	API-only	Flagship agent model	Built for coding agents, long-horizon autonomous execution, MCP tool use, and one-million-token context workloads.
Qwen3.7-Max-Preview	Preview / evaluation SKU	Leaderboard and early-access version	Used in public previews and benchmark discussions before or alongside the commercial Max endpoint.
Qwen3.7-Plus-Preview	Preview	Multimodal sibling	Reported as the Plus-side preview for vision and broader multimodal use, while Max is positioned around agentic text, coding, and tool execution.
Qwen 3.7 open or mid-tier variants	Not officially released	Expected ecosystem layer	Developers expect smaller or mid-tier open-weight variants based on Qwen's earlier release cadence, but Alibaba has not made those weights available yet.

Qwen3.7-Max is positioned above Qwen 3.6 Plus and Qwen 3.5 in Alibaba's hosted model stack. It is especially aimed at agent workloads: multi-file software engineering, tool calls across external systems, office document automation, and tasks that require the model to maintain state across many steps. Alibaba's launch material highlighted a 35-hour autonomous kernel optimization run with more than 1,000 tool calls as the signature example of what the model was built to do.

Benchmark Comparison

The cleanest comparison is against Claude Opus 4.7 and GPT-5.5, because all three are positioned as frontier-class agent and reasoning models rather than small local models. The important pattern is mixed: Qwen 3.7 Max is close to the frontier on reasoning and coding, but GPT-5.5 keeps a large terminal-workflow lead while Opus 4.7 remains stronger on repository-level coding.

Benchmark	Qwen 3.7 Max	Claude Opus 4.7	GPT-5.5	Read
GPQA Diamond	92.4%	94.2%	93.6%	All three are close; Opus has the narrow reasoning lead.
SWE-bench Pro	60.6%	64.3%	58.6%	Opus leads repository-level coding; Qwen clears GPT-5.5 on this metric.
Terminal-Bench 2.0 family	69.7%	69.4%	82.7%	GPT-5.5 is clearly ahead on terminal-heavy execution.
MCP-Atlas / tool orchestration	76.4%	77.3%	75.3%	Opus and Qwen are tightly grouped for complex tool workflows.
Humanity's Last Exam	41.4%	46.9%	41.4%	Opus holds the broad expert-reasoning lead without tools.

The strategic shift is the important part. Qwen's earlier reputation came from releasing strong open-weight models under permissive licenses. Qwen 3.7 Max moves the best model in the family behind an API gate, using open and mid-tier releases to support the ecosystem while reserving the flagship for hosted enterprise revenue. That makes Qwen 3.7 both a technical upgrade and a change in Alibaba's open-weight playbook.

References & Resources

Last updated: May 26, 2026

Qwen 3.7

Benchmark Comparison

References & Resources

Related Terms