Popular Optimization Algorithms

Explore popular optimization algorithms used in training deep neural networks. Understand the challenges of basic SGD and how improvements like Momentum, Adagrad, RMSProp, and Adam address these issues. This lesson helps you grasp adaptive learning rates, momentum concepts, and practical examples to enhance model training.

We'll cover the following...

Concerns on SGD
Adding momentum
Adaptive learning rate
- Adagrad
- RMSprop
Adam

The gradients are still noisy because we estimate them based only on a small sample of our dataset. The noisy updates might not correlate well with the true direction of the loss function.
Choosing a good loss function is tricky and requires time-consuming experimentation with different hyperparameters.
The same learning rate is applied to all of our parameters, which can become problematic for features with different frequencies or significance.

To overcome some of these problems, many improvements have been proposed over the years.

Adding momentum

One of the basic improvements over SGD comes from adding the notion of momentum. Borrowing the principle of momentum from physics, we enforce SGD to keep moving in the same direction as the previous timesteps. To accomplish this, we introduce two new variables: velocity and friction.

Velocity $v$ is computed as the running mean of gradients up until a point in time and indicates the direction in which the gradient should keep moving towards.
Friction $\rho$ ...

1.Learn Deep Learning

2.Neural Networks

3.Training Neural Networks

4.Convolutional Neural Networks

5.Recurrent Neural Networks

6.Autoencoders

7.Generative Adversarial Networks

8.Attention and Transformers

9.Graph Neural Networks

10.Conclusion

Assessment

Popular Optimization Algorithms

Concerns on SGD

Adding momentum