What topics are covered in intermediate machine learning?

Backpropagation, optimization algorithms (Adam etc.), regularization (Dropout, BatchNorm), CNN (convolution, pooling, architectures), RNN, LSTM/GRU, word embeddings, dimensionality reduction, hyperparameter optimization, time-series analysis, and major DL frameworks are covered.

What prerequisites are needed for intermediate machine learning?

Completion of the intro and basic levels, particularly foundational linear algebra, calculus, probability/statistics, Python programming with NumPy and scikit-learn, and understanding of core machine learning algorithms.

How do CNN and RNN differ in their applications?

CNNs excel at extracting local spatial features and are ideal for images and audio spectrograms. RNNs specialize in learning temporal dependencies and are used for sequence data and NLP. Modern Transformers increasingly unify both capabilities through self-attention.

Machine Learning Intermediate

Deep Learning Foundations — Intermediate (university year 3-4 level)

About This Chapter

At the intermediate level, we study the foundations of neural networks and deep learning. Starting from the multilayer perceptron, we proceed to CNNs for image recognition and RNNs for sequential data. The goal is to understand backpropagation, optimization algorithms, and regularization techniques, and to grasp why deep learning works.

Prerequisites

Basic-level content (classical ML methods)
Linear algebra (matrix operations, eigenvalues)
Calculus (chain rule, gradients)
Fundamentals of probability theory

1. Introduction to Neural Networks

Basic structures of deep learning.

Perceptron
Multilayer Perceptron (MLP)
Activation Functions

2. Backpropagation

The algorithm for gradient computation.

Application of the Chain Rule
Computational Graphs
Vanishing Gradient Problem

3. Optimization Algorithms

Improvements on gradient descent.

SGD and Mini-batch
Momentum, RMSprop
Adam

4. Regularization Techniques

Preventing overfitting.

Dropout
Batch Normalization
Data Augmentation

5. Convolutional Neural Networks

The foundation of image recognition.

Convolutional Layers
Pooling Layers
CNN Architectures

6. Advances in CNNs

Representative architectures.

LeNet, AlexNet, VGG
ResNet (Residual Connections)
Transfer Learning

7. Recurrent Neural Networks

Processing sequential data.

RNN Architecture
BPTT
Long-term Dependency Problem

8. LSTM and GRU

Improvements via gating mechanisms.

LSTM Cell
GRU
Bidirectional RNN

9. Embeddings and Representation Learning

Handling discrete data.

One-hot Encoding
Word2Vec
Embedding Layer

10. Deep Learning Frameworks

Implementation in practice.

PyTorch / TensorFlow
Automatic Differentiation
GPU Utilization

11. Dimensionality Reduction

Visualization and compression of high-dimensional data.

PCA, Kernel PCA
t-SNE, UMAP
Factor Analysis

12. Hyperparameter Optimization

Systematic methods for model tuning.

Grid / Random Search
Bayesian Optimization, Optuna
Hyperband

13. Time Series Forecasting

Analysis of data along the time axis.

ARIMA, SARIMA
Prophet, State Space Models
Transformer-based Methods

Key Concepts and Methods

Multilayer Perceptron

Output of layer $l$: $\boldsymbol{h}^{(l)} = \sigma(\boldsymbol{W}^{(l)} \boldsymbol{h}^{(l-1)} + \boldsymbol{b}^{(l)})$
Through the nonlinear activation function $\sigma$, complex functions can be represented.

Backpropagation

Compute the gradient of the loss $L$ with respect to the parameters $\boldsymbol{W}^{(l)}$ by applying the chain rule, propagating from the output layer back to the input layer: $$\dfrac{\partial L}{\partial \boldsymbol{W}^{(l)}} = \dfrac{\partial L}{\partial \boldsymbol{h}^{(l)}} \dfrac{\partial \boldsymbol{h}^{(l)}}{\partial \boldsymbol{W}^{(l)}}$$

Convolution Operation

Convolution of input $\boldsymbol{X}$ with filter $\boldsymbol{K}$: $(\boldsymbol{X} * \boldsymbol{K})_{i,j} = \displaystyle\sum_{m,n} X_{i+m, j+n} K_{m,n}$
Detects local patterns and provides translation invariance.

Residual Connection (ResNet)

$\boldsymbol{y} = F(\boldsymbol{x}) + \boldsymbol{x}$ (skip connection). Makes the identity mapping easier to learn, enabling the training of very deep networks.

LSTM

Through gating mechanisms (input, forget, and output gates), LSTM learns long-term dependencies. The cell state $\boldsymbol{c}_t$ retains and updates information.

Applications at This Level

Image Classification

Image recognition with CNNs. Leveraging models pre-trained on ImageNet via transfer learning.

Object Detection

Architectures such as YOLO and Faster R-CNN. Predicting bounding boxes.

Sentiment Analysis

Text classification with LSTM/GRU. Determining positive or negative polarity from reviews.

Time Series Forecasting

Forecasting stock prices, demand, and sensor data with RNNs.

Learning Points

Forward and Backward Pass: Trace through small examples by hand
Gradient Flow: Understand why training is difficult in deep networks
Architecture Design: Choose structures suited to the task
Implementation and Experimentation: Gain experience writing code and tuning hyperparameters