Gradient descent is an optimisation method for finding the minimum of a function. It is commonly used in deep learning models to update the weights of the neural network through backpropagation.
In this post, I will summarise the common gradient descent optimisation algorithms used in popular deep learning frameworks (e.g. TensorFlow, Keras, PyTorch, Caffe). The purpose of this post is to make it easy to read and digest (using consistent nomenclature) since there aren’t many of such summaries out there, and as a cheat sheet if you want to implement them from scratch.