Dropout

Deep Learning

A regularization technique that randomly sets a fraction of neuron activations to zero during each training step, preventing co-adaptation and reducing overfitting.

Imagine randomly covering some players' eyes during practice so the team does not rely too heavily on any one player.

Dropout, introduced by Srivastava et al. in 2014, works by randomly zeroing out neuron activations with a given probability (typically 0.5 for hidden layers) during each forward pass in training. The remaining activations are scaled by 1/(1-p) to maintain the expected value. At test time, all neurons are used without dropout.

The technique works through multiple complementary mechanisms. First, it prevents co-adaptation: neurons cannot rely on specific other neurons being present, forcing each to learn more robust and independent features. Second, it acts as an implicit ensemble: each training step uses a different random sub-network, and the full network at test time approximates the average prediction of exponentially many (2^n) sub-networks. For linear models, dropout on inputs is mathematically equivalent to L2 regularization.

In practice, dropout rate of 0.5 is common for hidden layers, 0.2 for input layers, and 0 for output layers. Convolutional layers typically use lower rates (0.1-0.2) or spatial dropout. Dropout is one of the most widely used regularization techniques in deep learning and was a key innovation that made training large neural networks practical without catastrophic overfitting.

Last updated: February 22, 2026

Dropout

Related Terms