Cheatsheet: What machine learning actually is
Rules vs learning from data
Section titled “Rules vs learning from data”| Traditional programming | Machine learning | |
|---|---|---|
| You provide | The rules (logic) | Examples (inputs plus answers) |
| Machine provides | Execution | The learned rules |
| Intelligence lives in | Your code | The data |
| Best when | Rule is simple and known | Rule is too many, fuzzy, or unknown, and you have data |
The two families
Section titled “The two families”| Family | Has labels? | Goal | Sub-types |
|---|---|---|---|
| Supervised | Yes | Predict the answer for new inputs | Regression (number), Classification (category) |
| Unsupervised | No | Find structure in the data | Clustering, Dimensionality reduction |
| Reinforcement (out of scope) | No (uses reward) | Learn by trial and error | n/a here |
Supervised: regression vs classification
Section titled “Supervised: regression vs classification”| Regression | Classification | |
|---|---|---|
| Answer type | A number | A category |
| Examples | House price, temperature, units sold | Spam / not-spam, which digit, fraud / legit |
The two-question classification test
Section titled “The two-question classification test”| Step | Ask | If… |
|---|---|---|
| 1 | Do I have labeled answers? | No -> unsupervised, or not machine learning |
| 2 | Is the answer a number or a category? | Number -> regression; Category -> classification |
When NOT to use machine learning
Section titled “When NOT to use machine learning”| Situation | Why a rule wins |
|---|---|
| A simple known rule works | Exact, fast, fully explainable (e.g. sales tax) |
| No data | Nothing to learn from |
| Unexplainable mistake unacceptable | A statistical black box cannot be trusted |
The one rule that governs everything
Section titled “The one rule that governs everything”| Claim | Verdict |
|---|---|
| ”Perfect on the training data” | Proves nothing |
| ”Holds up on new, unseen data” | The only test that counts |
| Model memorizes training data | Learned the noise, not the pattern |
Worked classifications
Section titled “Worked classifications”| Problem | Answer |
|---|---|
| Predict tomorrow’s temperature | Supervised, regression |
| Is this transaction fraud? | Supervised, classification |
| Group shoppers, no categories given | Unsupervised, clustering |
| Compute sales tax | Neither (write the rule) |
| Compress 200 survey questions to a few themes | Unsupervised, dimensionality reduction |