Reinforcement Learning

A machine learning paradigm where an agent learns to make decisions by taking actions in an environment and receiving rewards or penalties.

Reinforcement learning (RL) is a type of machine learning in which an agent interacts with an environment by taking actions, observing the resulting states, and receiving reward signals. The agent's goal is to learn a policy - a mapping from states to actions - that maximizes cumulative reward over time. Unlike supervised learning, RL does not require labeled input-output pairs; instead, the agent discovers optimal behavior through trial and error.

Key concepts in RL include the state space, action space, reward function, value function, and policy. Algorithms are broadly divided into model-free methods (such as Q-learning, policy gradient, and actor-critic methods) and model-based methods that learn a model of the environment. Deep reinforcement learning combines deep neural networks with RL algorithms, enabling agents to handle high-dimensional state spaces like raw images.

Reinforcement learning has achieved landmark results including superhuman gameplay in Go (AlphaGo), Atari games, and StarCraft II. Beyond games, RL is applied in robotics, autonomous driving, recommendation systems, and increasingly in fine-tuning large language models through RLHF (reinforcement learning from human feedback), where human preferences serve as the reward signal.

Last updated: February 20, 2026

Reinforcement Learning

Related Terms