Add Vanishing Gradients

The vanishing gradient problem refers to a situation in deep neural networks where the gradients become very small as they propagate through many layers, making it difficult for the model to learn from earlier layers. Gradients are used in the backpropagation algorithm to update the weights of the neural network, and if they become too small, the network may not be able to learn effectively.

Original paper on vanishing gradients by Hochreiter and Schmidhuber.
Explanation of vanishing gradients and its impact on deep learning:
How to deal with vanishing gradients in deep learning:
Video tutorial on vanishing gradients and exploding gradients: