What are the features of amazon alexa?
combines several natural language processing methods in single service (speecth synthesis, speech recognitoin, …)
cannot engage in sophisticated dialogues
only limited to set of user commands
mainly a language-based user interface to web services
What are the features of google LaMDA?
based on transformer architecture
-> specifically designes for dialouge generation (i.e. chatbots)
incorporates metrics to improve quality of conversaiton
sensibleness
specifity
interestingness
helpfulness
training include dialogues that were evaluated by humans according to these mstrics
system can additionally access external data sources to answer questions
e.g. calculator, web search
What means multimodality w.r.t. ML models?
i.e. GPT-4 is multmodal
-> can process both text and images…
What is the connection between model size and perfornance in LLM?
language models scale well with their number of parameters
=> rapid growth of SotA language models…
What are the four key innovations behind ChatGPT?
training of a generative model
self-supervised training alleviates need for manual labeling
new neural network architecture for text called transformer
equivalent of CNNs for text…
new training procedure for learning dialogues
optimization of model for human preferences
massively large scaling of the model
improving performance with data and computing power
What is the main difference between generative and discriminative models?
discriminative:
classify data
generative:
generate new data (i.e. iamges, text, audio, etc.)
=> ChatGPT is generative for natural language
What was an early predecessor to geneartive models?
T9 - text input method for phones
-> instead of pressing digit key multiple times -> press once
-> use interlan dictionary to determine words matchgin sequence of pressed keys…!
at begining -> frequency of use
frequencies adapted over time according to user preferences
How do NN for NLP learn word representations?
learn statistical relations between words
=> words are in general characterized by their context (i.e. word2vec…)
GPT: uses mechanism that is called self-attention to learn contextual relationships between words…
What is a possible explanation to how chatGPT can understand language?
similar to CNN
-> convolutions capture locality of features
self-attention captures semantic relationships between words in texts
=>
deep CNN learn increasingly abstract image features
-> transformer learn increasingly abstract concepts
What does the pre-trained part mean in chatGPT?
pre trained for general text prediction
-> to answer questions dialogue-based
=> needs training with data specific to that task…
What is the learning process in chatGPT?
3 steps
Step 1:
collect demonstration data and train a supervised policy
Step 2:
collect comparison data and train a reward model
Step 3:
optimize a policy against the reward model using the PPO reinforcement learning algorithm
How does step 1 of ChatGPTs training work?
sample prompt from prompt dataset
i.e. “explain reinfocement learnin got a 6 year old”
labeler demonstrated the desired output behavior
i.e. labeles writes answer
e.g. “we give treats and punishments to teach…”
this data is used to fine-tune GPT with supervised learning
How does step 2 of chat gpts training work?
promt and several model outputs are sampled
i.e. prompt: “explain reinforcement learning to a 6 year old
model 1 answer: i.e. in reinfocement learning, the agent is…
model 2 answer: i.e. explain rewards…
model 3: we give treats and punishemnets to teach…
a labeler (human) ranks the outputs from best to worst
i.e. model 3 > model 1 > model 3…
data used to train reward model
What are some limits of AI systems based on GPT?
are statistical models
=> can only approximate exact computation to some extent (ie.e. arithmetic, algorithms, rules,….)
How does step 3 of chatgpts training work?
new prompt is sampeled from dataset
i.e. write a story about otters…
PPO model is initialized from the supervised policy
policy generates an output
reward model calculates a reward for the output
reward is used to update the policy using PPO
=> train reward model on some samples
=> use policy to train on more models…
Zuletzt geändertvor 2 Jahren