What are RNNs?
RNN stands for recurrent neural network
a deep learning model that is trained to process and convert a sequential data input into a specific sequential data output
Why do we need RNNs?
speech recognition
music generation
sentiment classification
DNA sequence analysis
machine translation
video activity recognition
name entity recognition
can aproximate any function
What can be problem with learning in RNNs?
overfitting
To train an RNN on long sequences, we must run it over many time steps, making the unrolled RNN a very deep network
It suffers from the unstable gradients problem: It may take forever to train, or training may be unstable.
What are LSTMs and why do we need them? Provide a drawing and equations for LSTM gates.
The key idea is that the network can learn what to store in the long-term state, what to throw away, and what to read from it
STM cell can learn to: recognize an important input (input gate), store it in the long-term state, preserve it for as long as it is needed (forget gate), and extract it whenever it is needed (output gate).
What are GRUs?
GRU cell is a simplified version of the LSTM cell, and it seems to perform just as well
How can CNNs be used for sequence data processing?
1D convolutional layer slides several kernels across a sequence, producing a 1D feature map per kernel.
Each kernel will learn to detect a single very short sequential pattern (no longer than the kernel size).
One can build a neural network composed of a mix of recurrent layers and 1D convolutional layers (or even 1D pooling layers).
By shortening the sequences, the convolutional layer may help the LSTM/GRU layers to detect longer patterns.
If a 1D convolutional layer with a stride of 1 and "same" padding is used, then the output sequence will have the same length as the input sequence. But if "valid" padding or a stride greater than 1 is used, then the output sequence will be shorter than the input sequence.
How does enrolled RNN looks like, provide a drawing?
What can be a problem with learning in RNNs
underfitting
not enough data
unstable gradient
Last changed23 days ago