>_TheQuery
← Glossary

Dropout

Deep Learning

A regularization technique that randomly sets a fraction of neuron activations to zero during each training step, preventing co-adaptation and reducing overfitting.

Dropout, introduced by Srivastava et al. in 2014, works by randomly zeroing out neuron activations with a given probability (typically 0.5 for hidden layers) during each forward pass in training. The remaining activations are scaled by 1/(1-p) to maintain the expected value. At test time, all neurons are used without dropout.

The technique works through multiple complementary mechanisms. First, it prevents co-adaptation: neurons cannot rely on specific other neurons being present, forcing each to learn more robust and independent features. Second, it acts as an implicit ensemble: each training step uses a different random sub-network, and the full network at test time approximates the average prediction of exponentially many (2^n) sub-networks. For linear models, dropout on inputs is mathematically equivalent to L2 regularization.

In practice, dropout rate of 0.5 is common for hidden layers, 0.2 for input layers, and 0 for output layers. Convolutional layers typically use lower rates (0.1-0.2) or spatial dropout. Dropout is one of the most widely used regularization techniques in deep learning and was a key innovation that made training large neural networks practical without catastrophic overfitting.

Last updated: February 22, 2026