Reviewing the Steps of Gradient Descent

Explore the fundamental steps of gradient descent applied to a simple linear regression problem using PyTorch. Understand how to generate synthetic data, compute predictions, calculate mean squared error loss, find gradients with respect to parameters, and update those parameters iteratively. This lesson outlines batch, stochastic, and mini-batch gradient descent types and shows how repeated epochs train the model.

We'll cover the following...

Simple linear regression
Data generation

Synthetic data generation
Splitting data

Gradient descent

Step 0 - Random initialization
Step 1 - Compute model’s predictions
Step 2 - Compute the loss
Step 3 - Compute the gradients
Step 4 - Update the parameters
Step 5 - Rinse and repeat!

Practice

Simple linear regression

Most tutorials start with some nice and pretty image classification problems to illustrate how to use PyTorch. It may seem cool, but I believe it distracts you from learning how PyTorch works.

For this reason, in this first example, we will stick with a simple and familiar problem: a linear regression with a single feature x! It does not get much simpler than that. It has the following equation:

$y = b + w x + \epsilon$

It is also possible to think of it as the simplest neural network possible: one input, one output, and no activation function (that is, linear).

Step 4 - Update the parameters

In the final step, we use the gradients to update the parameters. Since we are trying to minimize our losses, we reverse the sign of the gradient for the update.

There is still another hyperparameter to consider; the learning rate, denoted by the Greek letter eta (that looks like the letter n), is the multiplicative factor that we need to apply to the gradient for the parameter update.

In our example, let us start with a value of 0.1 for the learning rate (which is a relatively big value as far as learning rates are concerned!).

Definition of epoch:

An epoch is complete whenever every point in the training set (N) has already been used in all steps: forward pass, computing loss, computing gradients, and updating parameters.

During one epoch, we perform at least one update, but no more than N updates.

The number of updates (N/n) will depend on the type of gradient descent being used:

For batch (n = N) gradient descent, this is trivial, as it uses all points for computing the loss. One epoch is the same as one update.

For stochastic (n = 1) gradient descent, one epoch means N updates since every individual data point is used to perform an update.

For mini-batch (of size n), one epoch has N/n updates since a mini-batch of n data points is used to perform an update.

1.Introduction

2.Visualizing Gradient Descent

3.A Simple Regression Problem

4.Rethinking the Training Loop

5.Going Classy

6.A Simple Classification Problem

7.Conclusion

8.Appendix

Reviewing the Steps of Gradient Descent

Simple linear regression

Data generation

Synthetic data generation

Splitting data

Gradient descent

Step 0 - Random initialization

Step 1 - Compute model’s predictions

Step 2 - Compute the loss

Step 3 - Compute the gradients

Step 4 - Update the parameters

Step 5 - Rinse and repeat!

Practice