>_TheQuery
← Glossary

GLM-5

LLM Models

Z.ai's 744B-parameter open-weight MoE model for agentic engineering, coding, reasoning, and long-horizon tool-use tasks.

Like a very large engineering team where only the relevant specialists show up for each task: the full organization is huge, but each token only activates the experts it needs.

GLM-5 is a large language model from Z.ai, formerly Zhipu AI, released in February 2026 as the successor to GLM-4.5 and GLM-4.7. The model is built for what Z.ai calls agentic engineering: long-horizon coding, tool use, system building, debugging, and multi-step software engineering work that goes beyond single-turn code completion.

The base GLM-5 model uses a Mixture-of-Experts architecture with 744 billion total parameters and about 40 billion active parameters per token. Compared with GLM-4.5, Z.ai says it scaled from 355B total parameters and 32B active parameters to 744B total and 40B active, while increasing pretraining data from 23 trillion to 28.5 trillion tokens. It also integrates DeepSeek Sparse Attention, or DSA, to reduce deployment cost while preserving long-context capability.

Why GLM-5 Matters

GLM-5 matters because it represents the Chinese open-weight model ecosystem moving from strong chat and coding models toward full agentic software engineering systems. The model is not optimized only for answering questions. It is optimized for long-running tasks where the model has to reason, call tools, inspect results, recover from errors, and keep working across many steps.

On Z.ai's reported benchmark table, GLM-5 scores 77.8 on SWE-bench Verified, 73.3 on SWE-bench Multilingual, 92.7 on AIME 2026 I, 86.0 on GPQA-Diamond, and 62.0 on BrowseComp. The important pattern is not any single number. It is that GLM-5 is competitive with closed frontier models on coding, browsing, reasoning, and agentic benchmarks while publishing open weights.

GLM-5 vs GLM-4.5

FeatureGLM-4.5GLM-5
Release periodJuly 2025February 2026
Total parameters355B744B
Active parameters32B40B
Pretraining data23T tokens28.5T tokens
ArchitectureMoE agentic modelMoE with DeepSeek Sparse Attention
Main focusAgentic, reasoning, and coding tasksLong-horizon agentic engineering and complex systems work
Open weightsYesYes

Deployment

The weights are available through the Z.ai organization on Hugging Face and ModelScope. The Hugging Face model card lists an MIT license and shows support for local serving through frameworks such as vLLM, SGLang, KTransformers, Transformers, and xLLM. That makes GLM-5 relevant for developers who want a frontier-class model that can run outside a closed API environment, though serving a 744B MoE model still requires serious infrastructure.

GLM-5 should also be distinguished from GLM-5.1, the later Z.ai update that focuses even more heavily on sustained long-horizon agentic work. GLM-5 is the base release that established the 744B-A40B architecture and benchmark position; GLM-5.1 builds on that line.

Practical Use Cases

GLM-5 is most relevant for agentic coding systems, repository-scale software engineering, autonomous debugging, browser and terminal tasks, research automation, tool-calling agents, and multilingual coding workflows. It is less relevant for simple chatbots or lightweight local applications, where smaller models will be much cheaper and easier to serve.

The short version: GLM-5 is not just another large chat model. It is an open-weight attempt to make frontier agentic engineering capability available outside the closed-model API stack.

Last updated: May 15, 2026