Batch Normalization

Deep Learning

A technique that normalizes the inputs of each layer in a neural network across the current mini-batch, stabilizing and accelerating training.

Imagine recalibrating a scale between measurements so that small differences do not get lost in the noise.

Batch normalization (BatchNorm) is a technique introduced by Ioffe and Szegedy in 2015 that normalizes the activations of each layer by adjusting and scaling them based on the mean and variance computed over the current mini-batch. After normalization, learned scale and shift parameters are applied, allowing the network to undo the normalization if that is optimal for the task.

The primary benefits of batch normalization include faster training convergence, the ability to use higher learning rates, and reduced sensitivity to weight initialization. It also provides a mild regularization effect, reducing the need for other regularization techniques like dropout in some cases. During inference, running averages of the batch statistics computed during training are used instead of per-batch statistics.

Batch normalization has become a standard component in many deep learning architectures, particularly in convolutional neural networks for computer vision. However, it has limitations: it works poorly with very small batch sizes and can behave differently during training and inference. Alternatives like Layer Normalization (used in transformers), Group Normalization, and Instance Normalization have been developed for scenarios where BatchNorm is suboptimal.

Last updated: February 20, 2026

Batch Normalization

Related Terms