Sparse Model

Architecture

A model where only a subset of parameters are activated for any given input, reducing compute requirements while maintaining the capacity benefits of a larger network.

Imagine a department store where only the relevant departments open for each customer - the unused sections stay dark to save energy.

A sparse model is a neural network designed so that each input activates only a fraction of the model's total parameters. This contrasts with dense models, where every parameter participates in every computation. Sparsity allows models to scale to very large parameter counts without proportionally increasing the compute required for inference.

The most common form of learned sparsity in modern AI is the Mixture of Experts architecture, where a routing mechanism selects which sub-networks process each token. But sparsity also appears in other forms: weight pruning removes individual connections that contribute little to model output, structured pruning removes entire neurons or attention heads, and activation sparsity exploits the fact that many neurons naturally output zero for most inputs.

Sparse models represent a fundamental shift in how the field thinks about scaling. The early scaling laws suggested that performance improves predictably with total parameter count, but sparse architectures show that what matters is not how many parameters exist but how intelligently they are allocated. A well-designed sparse model can match or exceed a dense model many times its active parameter count, which is why most frontier models released in 2025 and 2026 use some form of sparsity.

Last updated: March 5, 2026

Sparse Model

Related Terms