Chapter 5 of 11
Chapter 3 - Classical Machine Learning: Thinking in Features
The Crux
Neural networks get all the hype, but most production ML is still "classical" methods: linear models, decision trees, ensembles. Why? They're interpretable, debuggable, and often work better with small data. This chapter is about thinking in features, not layers.
Why Linear Models Still Dominate Industry
Walk into any real ML deployment, and you'll find:
- Banks: Logistic regression for credit scores
- Ad platforms: Linear models for click prediction
- Fraud detection: Gradient boosted trees
Why not deep learning everywhere?
Reason #1: Interpretability
Regulators, auditors, and customers ask: "Why was this decision made?"
Linear model: "Income weighted 0.3, debt ratio weighted -0.5, result was 0.7 > threshold."
Neural network: "Uh, 50 million parameters multiplied through 20 layers produced 0.7."
Guess which one the bank's legal team approves?
Reason #2: Sample Efficiency
Deep learning needs massive data. 10,000 examples? A neural net will overfit. A regularized linear model will generalize.
Rule of thumb: <100k examples? Try classical ML first.
Reason #3: Debugging
When a linear model fails:
- Check feature distributions
- Look at coefficients
- Test on slices
When a neural net fails:
- ¯_(ツ)_/¯
- Check everything
- Pray
Reason #4: Speed
Linear model prediction: microseconds. Neural network prediction: milliseconds (or worse).
At scale, milliseconds matter. Ad auctions, fraud detection, recommendation serving-latency is money.