What is the difference between regression and classification?

If the output y is a continuous value (a real number), the task is regression; if it is a discrete category, the task is classification. Examples of regression include price prediction and temperature forecasting, evaluated with metrics such as MSE or MAE. Examples of classification include spam detection and image recognition, evaluated with metrics such as accuracy or F1 score. The choice of loss function and algorithm depends on the type of problem.

Why do we split the data into training and test sets?

If you measure performance only on the training data, a model that has memorized the training data (i.e., overfitted) will appear to perform well, and you cannot assess its true predictive ability (generalization performance). Splitting the available data into roughly 70-80% training and 20-30% test, and using the test set as a simulation of unseen data, lets you objectively evaluate performance on new inputs.

Can I tune the model by repeatedly evaluating on the test set?

No. As a rule, the test set should be used only once, for the final evaluation. Repeatedly evaluating on the test set effectively turns it into training data and biases your estimate of generalization performance. When you need to evaluate multiple times for hyperparameter tuning, set aside a separate validation set carved out from the training data, and keep the test set untouched until the very end.

Chapter 3: Supervised Learning

Q: What is supervised learning?

Supervised learning uses training data consisting of input-label pairs (x, y) to learn a function f: x -> y that predicts y for new inputs. The labels act as a supervisor signal: the error between the model's prediction and the correct label is fed back to refine the function. Typical examples include housing price prediction (regression) and spam detection (classification).

What Is Supervised Learning?

Supervised Learning

Given pairs of input $\mathbf{x}$ and correct label $y$, learn a function $f$ that predicts the output for new inputs.

$$\{(\mathbf{x}_1, y_1), (\mathbf{x}_2, y_2), \ldots, (\mathbf{x}_n, y_n)\} \rightarrow f: \mathbf{x} \mapsto y$$

Figure 1: How supervised learning works — training data feeds an algorithm that produces a model f

What Is the "Supervisor"?

The correct label $y$ acts as the supervisor. When the model's prediction is wrong, the gap between the prediction and the label (the error) is fed back to correct the model.

Regression

The case where the output $y$ is a continuous value. The task is to predict a numeric quantity.

Figure 2: Regression — predicting a continuous value (e.g., housing price) with a linear fit

Examples of Regression Problems

Housing price prediction: floor area, building age, location → price (in tens of thousands of yen)
Sales forecasting: ad spend, season, day of week → revenue (yen)
Temperature forecasting: past weather data → tomorrow's high temperature (°C)
Stock price prediction: past price movements → next-day closing price (yen)

Classification

The case where the output $y$ is a discrete category. The task is to predict a class.

Figure 3: Classification — separating two classes with a decision boundary

Examples of Classification Problems

Spam detection: email contents → spam / not spam (binary classification)
Image recognition: an image → cat / dog / bird / ... (multiclass classification)
Medical diagnosis: test results → positive / negative
Sentiment analysis: text → positive / negative / neutral

Regression vs. Classification

Aspect	Regression	Classification
Output	Continuous value (real number)	Discrete value (category)
Examples	Price, temperature	Class label
Evaluation metrics	MSE, MAE	Accuracy, F1 score

Training Data and Test Data

Splitting the Data

Divide the available data into a training set and a test set.

Training data: used to fit the model
Test data: used to evaluate generalization performance (never used for training)

Figure 4: Data split — training 70-80%, test 20-30%

Why the Split Matters

Strong performance on the training data is meaningless if performance on unseen data is poor.

The test set serves as a simulation of "unseen data" so we can evaluate generalization performance.

What Is Overfitting?

Overfitting is the phenomenon where a model becomes so specialised to the training data that its performance on unseen data drops. A model that scores high on training data but poorly on test data is showing signs of overfitting. The train/test split is the most basic technique for detecting it.

Key Principles

The test set must never be used for training.
Evaluate on the test set only once, for the final evaluation.
If you need to evaluate multiple times, prepare a separate validation set.
The 70-80% / 20-30% ratio is just a guideline. With small datasets, shift toward training (e.g., 90% / 10%) or use cross-validation to average over multiple splits.

Summary

Supervised learning: learn from input-label pairs
Regression: predict a continuous value (price, temperature, etc.)
Classification: predict a category (spam detection, image recognition, etc.)
Data splitting: separate training and test sets to evaluate generalization performance

What Is Supervised Learning?

Supervised Learning

What Is the "Supervisor"?

Regression

Regression

Examples of Regression Problems

Classification

Classification

Examples of Classification Problems

Regression vs. Classification

Training Data and Test Data

Splitting the Data

Why the Split Matters

What Is Overfitting?

Key Principles

Summary

References