Machine Learning Intermediate
Deep Learning Foundations — Intermediate (university year 3-4 level)
About This Chapter
At the intermediate level, we study the foundations of neural networks and deep learning. Starting from the multilayer perceptron, we proceed to CNNs for image recognition and RNNs for sequential data. The goal is to understand backpropagation, optimization algorithms, and regularization techniques, and to grasp why deep learning works.
Prerequisites
- Basic-level content (classical ML methods)
- Linear algebra (matrix operations, eigenvalues)
- Calculus (chain rule, gradients)
- Fundamentals of probability theory
Table of Contents
1. Introduction to Neural Networks
Basic structures of deep learning.
- Perceptron
- Multilayer Perceptron (MLP)
- Activation Functions
2. Backpropagation
The algorithm for gradient computation.
- Application of the Chain Rule
- Computational Graphs
- Vanishing Gradient Problem
3. Optimization Algorithms
Improvements on gradient descent.
- SGD and Mini-batch
- Momentum, RMSprop
- Adam
5. Convolutional Neural Networks
The foundation of image recognition.
- Convolutional Layers
- Pooling Layers
- CNN Architectures
6. Advances in CNNs
Representative architectures.
- LeNet, AlexNet, VGG
- ResNet (Residual Connections)
- Transfer Learning
7. Recurrent Neural Networks
Processing sequential data.
- RNN Architecture
- BPTT
- Long-term Dependency Problem
9. Embeddings and Representation Learning
Handling discrete data.
- One-hot Encoding
- Word2Vec
- Embedding Layer
10. Deep Learning Frameworks
Implementation in practice.
- PyTorch / TensorFlow
- Automatic Differentiation
- GPU Utilization
11. Dimensionality Reduction
Visualization and compression of high-dimensional data.
- PCA, Kernel PCA
- t-SNE, UMAP
- Factor Analysis
12. Hyperparameter Optimization
Systematic methods for model tuning.
- Grid / Random Search
- Bayesian Optimization, Optuna
- Hyperband
13. Time Series Forecasting
Analysis of data along the time axis.
- ARIMA, SARIMA
- Prophet, State Space Models
- Transformer-based Methods
Key Concepts and Methods
Multilayer Perceptron
Output of layer $l$: $\boldsymbol{h}^{(l)} = \sigma(\boldsymbol{W}^{(l)} \boldsymbol{h}^{(l-1)} + \boldsymbol{b}^{(l)})$
Through the nonlinear activation function $\sigma$, complex functions can be represented.
Backpropagation
Compute the gradient of the loss $L$ with respect to the parameters $\boldsymbol{W}^{(l)}$ by applying the chain rule, propagating from the output layer back to the input layer: $$\dfrac{\partial L}{\partial \boldsymbol{W}^{(l)}} = \dfrac{\partial L}{\partial \boldsymbol{h}^{(l)}} \dfrac{\partial \boldsymbol{h}^{(l)}}{\partial \boldsymbol{W}^{(l)}}$$
Convolution Operation
Convolution of input $\boldsymbol{X}$ with filter $\boldsymbol{K}$:
$(\boldsymbol{X} * \boldsymbol{K})_{i,j} = \displaystyle\sum_{m,n} X_{i+m, j+n} K_{m,n}$
Detects local patterns and provides translation invariance.
Residual Connection (ResNet)
$\boldsymbol{y} = F(\boldsymbol{x}) + \boldsymbol{x}$ (skip connection). Makes the identity mapping easier to learn, enabling the training of very deep networks.
LSTM
Through gating mechanisms (input, forget, and output gates), LSTM learns long-term dependencies. The cell state $\boldsymbol{c}_t$ retains and updates information.
Applications at This Level
Image Classification
Image recognition with CNNs. Leveraging models pre-trained on ImageNet via transfer learning.
Object Detection
Architectures such as YOLO and Faster R-CNN. Predicting bounding boxes.
Sentiment Analysis
Text classification with LSTM/GRU. Determining positive or negative polarity from reviews.
Time Series Forecasting
Forecasting stock prices, demand, and sensor data with RNNs.
Learning Points
- Forward and Backward Pass: Trace through small examples by hand
- Gradient Flow: Understand why training is difficult in deep networks
- Architecture Design: Choose structures suited to the task
- Implementation and Experimentation: Gain experience writing code and tuning hyperparameters