Hyperparameter

Fundamentals

A configuration value set before training begins that controls the learning process itself, as opposed to model parameters which are learned from data.

The knobs and dials on the outside of the machine that you set before pressing start - the machine cannot adjust them itself.

Hyperparameters are the knobs you set before training that determine how the model learns. Unlike model parameters (weights, biases) that are learned from data via gradient descent, hyperparameters are chosen by the practitioner and include learning rate, batch size, number of layers, number of neurons per layer, regularization strength (lambda), dropout rate, and number of training epochs.

Hyperparameter selection is critical and surprisingly difficult. The learning rate alone can determine whether training succeeds or fails entirely. Common approaches include grid search (trying all combinations from a predefined set), random search (which is often more efficient than grid search), and Bayesian optimization (which uses past results to intelligently choose what to try next). Cross-validation is used to evaluate each hyperparameter setting's generalization performance.

A common trap is tuning hyperparameters on the test set, which leaks test information into the model and produces overly optimistic performance estimates. The correct approach uses a separate validation set (or cross-validation on the training set) for hyperparameter tuning, reserving the test set for final evaluation only. In production, hyperparameter tuning is often automated with frameworks like Optuna or Ray Tune.

Last updated: February 22, 2026

Hyperparameter

Related Terms