How do Convolutional Neural Networks (CNNs) process text data?
CNNs use filters with a sliding window that merge values in filter region into an output value to detect local patterns.
Key Features:
Filters parameters are learn from data during training.
Multiple filters are applied in parallel.
1D CNNs are used for text processing.
What is the main difference between CNNs and RNNs in NLP?
CNNs focus on local patterns with filters and work well for text classification.
RNNs process sequences recursively, handling context-dependent tasks like translation.
How does a 1D CNN operate in Natural Language Processing (NLP)? Draw a picture
Draw an RNN as Encoder
What NLP tasks are commonly solved using RNNs?
Translation
Question answering
Summarization
Chatbots
Email auto-response
How does an RNN function as an encoder in NLP tasks?
In sequence modeling, an RNN encodes an input sequence into a single vector representing the meaning of the input.
What are the main challenges in training RNNs?
Vanishing gradients → Hard to learn long-range dependencies.
Exploding gradients → Large weight updates can destabilize training.
Slow training compared to CNNs and Transformers.
3 Advantages of RNNs
Allow for arbitrarily sized inputs (Model global patterns in seuquence data)
Recursion gets unrolled to from a standard computation graph -> Trainable with backpropagation & gradient descent
Allows for conditional generation models (Ability to condition the next output on an entire sentence history)
What is the Encoder-Decoder architecture, and how is it used?
The Encoder-Decoder model processes sequence input & output for different tasks.
Features:
Encoder processes the input into a vector.
Decoder generates the output based on the encoded information.
Why is attention important in NLP models?
Attention helps the model focus on relevant parts of the input while making predictions.
Benefits:
Improves accuracy in long-sequence tasks.
Provides some interpretability by highlighting important words.
How does attention work in sequence-to-sequence models?
Computes a weighted average of input.
Uses softmax to assign importance scores.
Attention is parameterized and trained e2e with the model
What role does the fully connected layer play in attention-based models?
The fully connected layer:
Contains trainable parameters.
Applies non-linear activation.
Helps combine weighted input features for the final output.
Draw Attention mechanism
Last changed3 months ago