Chapter 3 - Classical Machine Learning: Thinking in Features

The Crux

Neural networks get all the hype, but most production ML is still "classical" methods: linear models, decision trees, ensembles. Why? They're interpretable, debuggable, and often work better with small data. This chapter is about thinking in features, not layers.

Why Linear Models Still Dominate Industry

Walk into any real ML deployment, and you'll find:

Banks: Logistic regression for credit scores
Ad platforms: Linear models for click prediction
Fraud detection: Gradient boosted trees

Why not deep learning everywhere?

Reason #1: Interpretability

Regulators, auditors, and customers ask: "Why was this decision made?"

Linear model: "Income weighted 0.3, debt ratio weighted -0.5, result was 0.7 > threshold."

Neural network: "Uh, 50 million parameters multiplied through 20 layers produced 0.7."

Guess which one the bank's legal team approves?

Reason #2: Sample Efficiency

Deep learning needs massive data. 10,000 examples? A neural net will overfit. A regularized linear model will generalize.

Rule of thumb: <100k examples? Try classical ML first.

Reason #3: Debugging

When a linear model fails:

Check feature distributions
Look at coefficients
Test on slices

When a neural net fails:

¯_(ツ)_/¯
Check everything
Pray

Reason #4: Speed

Linear model prediction: microseconds. Neural network prediction: milliseconds (or worse).

At scale, milliseconds matter. Ad auctions, fraud detection, recommendation serving-latency is money.

← Chapter 2 - Math You Can't Escape (But Can Tame)1 / 8