Artificial Intelligence
Definition
Agent
Environment
Sensing
Acting
What is machine learning?
A computer program is said to learn
from experience with respect to some class of tasks and performance measure,
if its performance at the tasks, improves with experience.
Machine Learning can be subdivided into solving three different tasks:
Clustering: Separate data points into some groups
Regression: Fit functions to data points
Classification: Separate data points into predefined groups
What general approaches are there to make an algorithm learn from data?
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Approach
Challenges
Approach: Use examples of input data and assigned output data (labels) to find an approximation of their relationship
Challenges: Where do we get the labels from?
Approach: Find patterns in data without knowing target values
Challenges: What patterns are we looking for? How do we know which patterns are useful?
Approach: Let the system try different approaches and reward desired states of the system
Challenges: What should the system try? What behavior led to the success? What is success? (Exploration vs. Exploitation and credit assignment problem)
Was sind Hyperparameter?
Hyperparameter sind externe Konfigurationsvariablen, die Datenwissenschaftler für das Training von Machine-Learning-Modellen verwenden.
bspw. die Anzahl der Knoten und Schichten in einem neuronalen Netzwerk
We want to know whether the model fits data well …
Verification measures the model‘s performance on the data used for selecting the model‘s parameters
The user wants to apply a model to data that was not used when finding the model‘s parameters …
Validation measures the model‘s performance on data not used for selecting the model‘s parameters
The user wants an unbiased estimate of the model‘s performance
Testing allows computing unbiased estimate of model‘s performance on data not used in learning problem
Graph including
Out of sample error
In sample error
Bias
Variance
6 Step Supervised Learning Process
Explain
epoch
batch
Using the complete data set once to update the weights —> epoch
Part of the data set to calculate one adjustment of 𝑾 —> batch
Method to minimize the loss function
(Stochastic) gradient descent
The step size along the negative gradient is called the learning rate. The choice of the learning rate is crucial because it affects whether we find a (local) minimum and at what speed.
Effect of Learning rate on convergence
n < n_opt
n = n_opt
n > n_opt
n > 2n_opt
Which optimization methods based on (stochastic) gradient descent try to mitigate potential problems such as „skipping“ minima and slow training in flat areas
Momentum, Nesterov Accelerated Gradient:
Keep moving in recent direction —> more independent of noise in batch
Adagrad, Adadelta, Rmsprop, Adam:
Make use of (approximation) of Hessian —> small steps in steep areas, large steps in flat areas
Explain underfitting and overfitting
Challenge: We often don‘t know whether the function we are trying to approximate is linear, quadratic, ...
—> Choosing a too simple model can lead to underfitting
—> Choosing a too complex model can lead to overfitting
g should learn features that are inherent to all data, not just to the data set
Generalization error
Generalization gap
Capacity
Underfitting zone
Overfitting zone
How can we reduce the generalization gap without reducing the model capacity?
Regularization
Example for a complexity penalty function
L2-Norm of a weight matrix (reeller Zahlenraum mxn)
Artificial Intelligence (AI)
Machine Learning (ML)
Representation Learning
Deep Learning
Key Concepts of Deep Learning
Artificial Neural Network (ANN): Computing system containing nodes and weighted connections that converts inputs to outputs
Convolutional Neural Network (CNN): Popular class of ANNs for processing images
Recurrent Neural Network (RNN): Popular class of ANNs for processing sequences of inputs (e.g. natural language processing)
Backpropagation: Method that efficiently computes the gradient of the loss function with the respect to the parameters of an ANN.
Artificial Neural Network
Basic idea:
Learned function 𝑔 is defined as a X consisting of X and X
Nodes are structured in X
At the beginning is an X, then a number of X and finally an X
network
vertices (nodes)
weighted edges (weighted graph)
layers
input layer
hidden layers
output layer
The basic building block of an ANN is an artificial neuron. It computes an activation 𝑎 based on some input 𝑋. The activation also depends on the learnable parameters (weights) and the choice of the activation function, which can also contain learnable parameters.
Activation functions
Rectified Linear Unit (ReLU)
Parametric Rectified Linear Unit (PReLU)
sigmoid function
step function
What‘s the problem with Fully Connected Networks?
Not suited for problems with many input dimensions (e.g. computer vision)
Too many weights
Image-size of 256x256x3 means 196.608 connections to each neuron in first hidden layer
—> Model does not fit into memory
—> Slow training
Model takes too long to compute output (slow inference) —> Problematic for automotive applications where strict requirements concerning time and cost apply
Three characteristics of CNNs lead to their advantage over FCNs:
Recurrent Neural Networks
Backpropagation
Error at output of neural network is propagated backwards through the network to compute gradients of loss function w.r.t. each weight in the network
Challenges of Deep Learning and real world data
Data quantity
Data quality
Data acquisition and cost
Last changeda year ago