Nemotron 3 Super
ModelsNVIDIA's March 2026 open-weight reasoning model with 120 billion total parameters and 12 billion active, combining Mamba and transformer layers in a hybrid MoE architecture with a 1 million token context window.
A sports car engine with cylinder deactivation -- all 120 billion parameters are there when you need them, but only 12 billion fire on the highway to save fuel without losing speed.
Nemotron 3 Super is NVIDIA's flagship open-weight model released on March 11, 2026. It introduces a hybrid mixture-of-experts architecture that combines transformer attention layers with Mamba state-space model layers -- a design that improves long-context efficiency while reducing the computational overhead of pure attention at large context lengths. NVIDIA's Latent MoE technique further optimizes routing, resulting in 120 billion total parameters with only 12 billion active per token at inference.
The model supports a 1 million token context window -- matching the longest context windows offered by proprietary frontier models -- and is designed for agentic and deep research workflows that require processing large documents, codebases, or extended conversation histories.
Performance improvements over the previous generation are significant: up to 5x higher throughput, 2x accuracy gains, and 3x faster inference through Multi-Token Prediction, which generates multiple tokens per forward pass rather than one. On NVIDIA's Blackwell hardware using NVFP4 precision, inference is up to 4x faster than FP8 on the previous Hopper generation.
Nemotron 3 Super ranked first on both DeepResearch Bench and DeepResearch Bench II at launch -- benchmarks designed to evaluate models on complex, multi-step research tasks requiring sustained reasoning over long contexts. On Artificial Analysis, it ranked top among open models of comparable size for efficiency and overall capability.
The model is available under a permissive open-weight license via Hugging Face, build.nvidia.com, Perplexity, OpenRouter, and major cloud providers. Its combination of open weights, 1M context, and strong reasoning benchmarks positions it as one of the most capable openly available models as of March 2026.
References & Resources
Last updated: March 13, 2026