Chapter 3: Supervised Learning

What Is Supervised Learning?

Supervised Learning

Given pairs of input $\mathbf{x}$ and correct label $y$, learn a function $f$ that predicts the output for new inputs.

$$\{(\mathbf{x}_1, y_1), (\mathbf{x}_2, y_2), \ldots, (\mathbf{x}_n, y_n)\} \rightarrow f: \mathbf{x} \mapsto y$$
How Supervised Learning Works Training Data input x₁ → y₁ input x₂ → y₂ input xₙ → yₙ Learning Algorithm Model f x → ŷ new x prediction ŷ
Figure 1: How supervised learning works — training data feeds an algorithm that produces a model f

What Is the "Supervisor"?

The correct label $y$ acts as the supervisor. When the model's prediction is wrong, the gap between the prediction and the label (the error) is fed back to correct the model.

Regression

Regression

The case where the output $y$ is a continuous value. The task is to predict a numeric quantity.

Regression: Predicting a Continuous Value Floor area (m²) Price Training data Regression model
Figure 2: Regression — predicting a continuous value (e.g., housing price) with a linear fit

Examples of Regression Problems

  • Housing price prediction: floor area, building age, location → price (in tens of thousands of yen)
  • Sales forecasting: ad spend, season, day of week → revenue (yen)
  • Temperature forecasting: past weather data → tomorrow's high temperature (°C)
  • Stock price prediction: past price movements → next-day closing price (yen)

Classification

Classification

The case where the output $y$ is a discrete category. The task is to predict a class.

Classification: Predicting a Category Feature 1 Feature 2 Class A Class B Decision boundary
Figure 3: Classification — separating two classes with a decision boundary

Examples of Classification Problems

  • Spam detection: email contents → spam / not spam (binary classification)
  • Image recognition: an image → cat / dog / bird / ... (multiclass classification)
  • Medical diagnosis: test results → positive / negative
  • Sentiment analysis: text → positive / negative / neutral

Regression vs. Classification

Aspect Regression Classification
Output Continuous value (real number) Discrete value (category)
Examples Price, temperature Class label
Evaluation metrics MSE, MAE Accuracy, F1 score

Training Data and Test Data

Splitting the Data

Divide the available data into a training set and a test set.

  • Training data: used to fit the model
  • Test data: used to evaluate generalization performance (never used for training)
How to Split Your Data All data (100%) Training (70-80%) Test (20-30%)
Figure 4: Data split — training 70-80%, test 20-30%

Why the Split Matters

Strong performance on the training data is meaningless if performance on unseen data is poor.

The test set serves as a simulation of "unseen data" so we can evaluate generalization performance.

What Is Overfitting?

Overfitting is the phenomenon where a model becomes so specialised to the training data that its performance on unseen data drops. A model that scores high on training data but poorly on test data is showing signs of overfitting. The train/test split is the most basic technique for detecting it.

Key Principles

  • The test set must never be used for training.
  • Evaluate on the test set only once, for the final evaluation.
  • If you need to evaluate multiple times, prepare a separate validation set.
  • The 70-80% / 20-30% ratio is just a guideline. With small datasets, shift toward training (e.g., 90% / 10%) or use cross-validation to average over multiple splits.

Summary

  • Supervised learning: learn from input-label pairs
  • Regression: predict a continuous value (price, temperature, etc.)
  • Classification: predict a category (spam detection, image recognition, etc.)
  • Data splitting: separate training and test sets to evaluate generalization performance

References