06. IR-Transformer Architecture

by Jo J.

What is the Transformer Architecture, and why is it important?

The Transformer is a neural network architecture that processes sequences without recurrence by using self-attention mechanisms.

Advantages:

How does self-attention help the Transformer process sequences?

Self-attention allows each word to attend to every other word in a sequence, capturing relationships regardless of distance.

Key Properties:

Helps with contextualization (understanding word meaning based on surrounding context).
Enables parallel processing (unlike RNNs, which process data sequentially).

Why are Transformers computationally expensive?

Self-attention has complexity of O(n^2) due to the pairwise word interactions.

-> This makes processing long sequences expensive, requiring high memory and computing power.

What is multi-head self-attention, and why is it useful?

Multi-head self-attention allows the model to focus on different parts of a sentence simultaneously.

Benefits:

What are the 4 main components of a Transformer model?

How does the Transformer differ from RNNs?

Transformer:

RNNs:

What are the special tokens used in BERT?

How does BERT’s pretraining process work?

Masked Language Model (MLM) – Predicts masked words in a sentence.
Next Sentence Prediction (NSP) – Determines if one sentence follows another.

After pretraining, BERT is fine-tuned for specific NLP tasks.

What is SPLADE, and how does it differ from BERT?

SPLADE combines BERT’s semantic understanding with sparse, interpretable representations.

-> It generates sparse vectors by activating only relevant terms from a vocabulary.

How does SPLADE generate sparse and interpretable vector representations?

Uses BERT embeddings to encode each input word
Expands dense vectors into mix of tokens
Enforece sparsity of result vectors through activation function -> Keeping only the most relevant terms
Regularization to further control the sparsity

-> This makes SPLADE more explainable and efficient than fully dense retrieval models.

Last changed
5 months ago