Whats the key idea behind rnns?
An internal state that updates as a sequence is processed
What is the recurrence formula
How does a simple computational graph for an rnn look like?
Whats a problem with backprop in rnns?
Whats a solution?
Forward through entire sequence to compute loss, then backward through entire sequence to compute gradient
Problem: Takes a lot of memory for long sequences!
Run forward and backward through chunks of the sequence instead of whole sequence, carry hidden state across sequences
What are usecases for rnns?
Image captioning —> Output of a CNN could be used to generate captions via the rnn
Translation
Speech recognition
What is a typical problem for rnns?
Vanishing gradient problem: Gradients become too small during backpropagation.
Long-term dependencies: Difficulty capturing information from earlier time steps in long sequences.
Training instability: Struggles to effectively train on long sequences.
Context "forgetting": RNNs lose earlier information as they process more steps.
LSTMs/GRUs: Solutions to handle long-range dependencies with gating mechanisms.
How is the vanishing gradient problem being solved?
LSTMs (Long Short-Term Memory networks) solve the vanishing gradient problem by using gating mechanisms that control the flow of information through the network, allowing it to maintain long-term dependencies.
Forget Gate: Decides what information to discard from the cell state.
Input Gate: Determines what new information should be added to the cell state.
Output Gate: Controls what part of the cell state is passed to the output.
Cell State: The LSTM’s cell state provides a path for gradients to flow unchanged over long sequences, preventing them from vanishing.
Gates: By controlling information flow, LSTMs maintain important long-term dependencies while discarding irrelevant data.
This structure allows LSTMs to retain information over long sequences, addressing the vanishing gradient problem seen in vanilla RNNs.
—> Uninterrupted gradient flow!
What other rnns are there?
Over 10000 different e.g. multilayer rnns:
Last changed2 months ago