Llama 3.1

Models

Meta's open-weight large language model family released in July 2024, available in 8B, 70B, and 405B parameter sizes with a 128k token context window.

Consider a versatile open-source engine available in three sizes - small enough for a motorcycle, large enough for a truck.

Llama 3.1 is a family of open-weight language models released by Meta AI in July 2024. It shipped in three sizes - 8B, 70B, and 405B parameters - making the 405B variant the largest open-weight model available at the time of release. All three sizes support a 128,000 token context window and were trained on over 15 trillion tokens of publicly available data.

The 8B variant became particularly significant in the AI ecosystem. Small enough to run on consumer hardware and fast enough for real-time applications, it became the default model for edge deployment, local inference with tools like Ollama, and cost-sensitive production workloads. Taalas later hardwired Llama 3.1 8B directly into silicon with their HC1 chip, achieving 17,000 tokens per second - a use case that demonstrated how a fixed, well-understood model could be optimized far beyond what general-purpose hardware allows.

The 405B model competed with proprietary frontier models on major benchmarks while being fully open-weight under Meta's community license. This made it the first open-weight model to credibly challenge closed-source alternatives at the frontier, accelerating the debate about whether open or closed model development would dominate the industry. Llama 3.1 laid the groundwork for Llama 4, which introduced mixture-of-experts architectures with the Scout and Maverick variants.

Last updated: March 8, 2026

Llama 3.1

Related Terms