undefined

by Jo J.

What is Reinforcement Learning (RL), and what are the 4 key components?

Reinforcement Learning (RL) involves an agent interacting with an environment by taking actions.

Key components:

Draw Reinforcement Learning

What is Reinforcement Learning from Human Feedback (RLHF)? And name the 3 Process steps

RLHF optimizes language models by using human preferences to guide training.

Reward function R(s, prompt) [s: output]

Reward is higher when humans prefere the output

Process:

1. Estimate reward function R(s; prompt)

Find the best generative model p that maximizes the expected reward

What are the 2 main approaches for estimating Reward R and evaluate them?

Two main approaches:

Get humas to provide absolute scores for each output -> Human judgement on different instances / by different people can be noisy or mis-calibrated
Ask for pairwise comparisons -> Can be more relibale
Scaling Reward Model: Large enough reward trained on large enough data approaching human performance

Name the 5 Steps for Reinforcement Learning from Human Feedback

Collect a dataset of human preferences ->Human annotators rank outputs based on preferability
Use this data to train a reward model -> Reward model returns a scalar reward which should numerically represent the human preference
We want to learn a policy (a Language Model) that optimizes against the reward model
Periodically train the reward model with more samples and human feedback
-> PROBLEM: System will learn to “cheat”, be producing gibberish / irrelevant outputs -> SOLUTION: Add penalty term penalizing deviations from distribution of pretrained LM

What are the challenges in RLHF?

Challenges:

The model may learn to "cheat" by generating gibberish outputs that maximize reward scores.
Difficult to train good reward models.
Requires a lot of human annotations.

How do reward models improve LLM behavior?

Reward models can enforce desired behaviors, such as:

2 Limitations of Reinforcement learning

NAme 2 appraoches for overcoming the challenge of LMs need to process massive amounts of data

Scale up the model and train it on longer context window size -> Bottleneck: Memory usage and number of operations in self-attention increase quadratically
Sparse Attention Patters -> Improve efficiency: Make attention operations sparse

What are Sparse Attention Patterns, and how do they help LLMs?

Sparse Attention reduces the number of attention computations by focusing only on selected tokens.

Implementation strategies:

Different layers and attention heads follow different sparsity patterns.
Earlier layers with sparser attention.

What is Retrieval-Augmented Generation (RAG), and why is it useful?

RAG combines LLMs with external data to improve responses.

Why is it useful?

LLMs cannot memorize all knowledge.
LLM knowledge can be outdated and hard to update.
LLM output is challenging to interpret and verify
LLMs are expensive to train; reducing their size while retrieving information is more efficient.

What are the 3 key design questions when implementing RAG?

Key design decisions in RAG:

Memories: What should be stored as memory? (Documents, databases, etc.).
Memory retrieval method: How should retrieval work? (Pretrained retriever or off-the-shelf search).
Retriever memories usage: How should retrieved information be used? (Fusion with prompt or model input).

What are common failures in Retrieval-Augmented Generation (RAG)?

Two major failure modes: