undefined

Buffl

by Erik M.

Whats the key idea behind rnns?

An internal state that updates as a sequence is processed

What is the recurrence formula

How does a simple computational graph for an rnn look like?

Whats a problem with backprop in rnns?

Whats a solution?

Forward through entire sequence to compute loss, then backward through entire sequence to compute gradient

Problem: Takes a lot of memory for long sequences!

Run forward and backward through chunks of the sequence instead of whole sequence, carry hidden state across sequences

What are usecases for rnns?

Image captioning —> Output of a CNN could be used to generate captions via the rnn

Translation

Speech recognition

What is a typical problem for rnns?

Vanishing gradient problem: Gradients become too small during backpropagation.
Long-term dependencies: Difficulty capturing information from earlier time steps in long sequences.
Training instability: Struggles to effectively train on long sequences.
Context "forgetting": RNNs lose earlier information as they process more steps.
LSTMs/GRUs: Solutions to handle long-range dependencies with gating mechanisms.

How is the vanishing gradient problem being solved?

LSTMs (Long Short-Term Memory networks) solve the vanishing gradient problem by using gating mechanisms that control the flow of information through the network, allowing it to maintain long-term dependencies.

Key Solutions:

Forget Gate: Decides what information to discard from the cell state.
Input Gate: Determines what new information should be added to the cell state.
Output Gate: Controls what part of the cell state is passed to the output.

How it Helps:

Cell State: The LSTM’s cell state provides a path for gradients to flow unchanged over long sequences, preventing them from vanishing.
Gates: By controlling information flow, LSTMs maintain important long-term dependencies while discarding irrelevant data.

This structure allows LSTMs to retain information over long sequences, addressing the vanishing gradient problem seen in vanilla RNNs.

—> Uninterrupted gradient flow!

What other rnns are there?

Over 10000 different e.g. multilayer rnns: