undefined

by Jo J.

Why are count-based language models (e.g., N-Grams) insufficient?

Limitations of count-based models:

How does the Fixed-Window Neural Language Model work?

Training: Basically optimizing its parameter O such that it assigns high probability to the target word

What are the 3 steps for feedig text to a neural net?

Steps for converting text into input:

What are remaining problems of Fixed-window neural LM? (3)

How do Neural Language Models improve upon N-Grams? (2)

Key improvements:

Tackles sparsity problem by learning dense embeddings.
Model size is O(n) [not O(exp(n)) with n beeing the window size]
Neural LMs share information about semantically similar prefixes and overcome sparsity issue (N-Grams treat all prefixes independent)

Effect: Better generalization to unseen text.

What are the challenges of RNNs?

Limitations of RNNs:

-> Solution: Use Self-Attention (Transformers).

What is the role of Self-Attention in language modeling?

Self-Attention helps models focus on relevant words in a sentence.

Benefits:

How is a Transformer trained for Language Modeling?

Training process:

Steps for training a transformer LM

Compute for each position their corresponding distribution over the whole vocab
Compute for each position the loss between the distribution and the gold output label
Sum the position-wise loss values to obtain a global loss
Using this loss, do Backprop and update Transformer parameter
Use Attention mask to prevent information leakage

Last changed
6 months ago

11. LLM - Neural LLM, Tokenization