Temperature

NLP

A parameter that controls the randomness of token sampling during LLM text generation by scaling the logits before applying softmax.

Consider a creativity dial - turn it down for predictable, focused answers; turn it up for surprising, diverse ones.

Temperature is a scalar that divides the model's output logits before the softmax function: softmax(x/tau). It controls how peaked or flat the resulting probability distribution is, directly affecting the diversity and creativity of generated text.

At low temperature (tau approaching 0), softmax becomes nearly one-hot , almost all probability mass goes to the highest-scoring token, making generation deterministic and repetitive. At temperature 1.0, the original model distribution is used. At high temperature (tau > 1), the distribution becomes flatter and more uniform, making less probable tokens more likely to be sampled, which increases creativity but also incoherence and potential for nonsensical output.

In practice, temperature is one of the most important generation-time parameters. Customer-facing chatbots typically use low temperatures (0.1-0.3) for consistent, factual responses. Creative writing applications use higher temperatures (0.7-1.0). Values above 1.5 generally produce text that is too random to be useful. Temperature interacts with other sampling parameters like top-k and top-p (nucleus sampling) to give fine-grained control over the generation quality-diversity tradeoff.

Related Terms

Softmax Large Language Model Inference

Last updated: February 22, 2026