Prefix Tuning

MLOps

A PEFT method that learns small trainable prefix vectors prepended to a model's internal attention states, steering generation without updating the full model.

Like whispering a learned stage direction into an actor's earpiece before every scene instead of retraining the actor from scratch.

Prefix tuning is a Parameter-Efficient Fine-Tuning (PEFT) method that adapts a frozen model by learning a set of trainable vectors, called prefixes, that are injected into the model's attention mechanism. These prefixes are not ordinary text tokens. They are learned continuous representations that influence the model's behavior during generation.

In transformer language models, the learned prefix is typically attached to the key and value states used by attention. This gives the model a task-specific "setup" before it processes the actual user input, steering outputs without modifying the main pretrained weights.

Advantages

Prefix tuning can be extremely parameter-efficient because the learned prefix is small relative to the full model. It is especially attractive when you want to steer generation behavior while keeping the base model fully frozen. It also avoids changing the core model weights, which can simplify checkpoint management.

Disadvantages

Because the method works by steering attention through learned prefixes rather than directly modifying model weights, it may be less expressive than methods like LoRA for some tasks. It can also be less intuitive to debug, since the learned control signal lives in internal hidden states rather than in obvious weight updates or inserted layers.

Example

A team building a report generator might train one prefix for concise executive summaries and another for long-form analytical explanations, using the same base model under both modes.

Last updated: April 2, 2026

Prefix Tuning

Related Terms