Why is it important that machines are able to learn?
Learning is essential for unknown environments –> if designer does not have omniscience
Learning is useful as a system construction method –> expose the machine to reality, rather than trying to hardcode it
Learning modifies the machines decision mechanisms to
improve performance
What is learning?
learning is everything that comes trough experience and modifies the behavior
learning modifies the propability of a behavior in a certain situation
learning is a change of behavior that is not explainable with maturing, injuries, sickness or predisposition
learning makes useful changes in our mind
learning modifies representations of what is experienced
learning helps to do the same task more efficiently and effectively the next time
What is Mitchells definition for Machine Learning?
“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.“
Initial position:
task T
performance measurement P
experience E with the task
Goal:
generalize the experience in a way that allows to improve your performance on the task
What are commonly used tasks for Machine Learning?
learning to play games, like Backgammon
recognizing spam mail
recognizing handwritten characters
learning to classify stars
market basket analysis
What are different machine learning scenarios?
supervised learning –> correct answers for each example
unsupervised learning –> correct answers are not given
reinforcement learning –> occasional rewards for good actions
What is inductive learning and how does it work?
inductive learning is the simplest form of learning meaning it learns a function from examples
Intitial state:
f is the (unknown) target function
An example is a pair (x, f(x))
Goal: find a hypothesis h
learn from given a training set of examples
such that h is approxamitely equal to f
Algorithm:
construct/adjust h to agree with f on training set
h should not be overfitting the training set
What is the pigeon experiment about?
In the pigeon experiment pigeons were trained to differentiate Grandmasters of art
If their choice was correct they got a reward
They got high accuracy on the training as well as on the test set –> pigeons really recognized patterns and different styles other than just memorizing it
This sort of learning is possible trough neural networks
How can performance of a hypothesis h be measured?
Use theorems of computational/statistical learning theory
try h on a new test set of examples where f is known
What are Neural Networks?
they model the brain and nervous system
they are based on very simple principles, but they behave in a very complex way
they can process information parallel
they have the ability to learn
applications:
problem solving
biological models
How does a biological neuron work and how can it be modelled?
neurons are a tiny unit in our brain and they are connected to other neurons via synapses
if a neuron is activated, it spreads its activation to all connected neurons
a artifivial neuron can be modelled in the following way:
it has inputs all with belonging weights
all different inputs sum up to the input function
the sum is then tested on the activation function
if they pass a certain threshold the neuron is activated, if not it remains neutral
according to that are the outputs adjusted
What is a perceptron and how does it predict, at last what is its realtion to boolean functions?
a perceptron is a single node in a modelled artificial neural network
it has a belonging activation function and a certain threshold which determines if it is activated or not
typical outputs are +1 and -1
a perceptron can also implement the logical functions and, or, not
but rather complex functions like xor cannot be modelled, because no linear separation is possible
What is the perceptron learing rule for unsupervised learning?
f(x) is the underlying target function
h(x) is the goal function/hypothesis
a is the learning rate
Wj is the weight of the corresponding input
xj is the corresponding input
How can we minimize the error?
The error function for one training example may be considered as a function in a multi-dimensional space
The best weight setting for one example is where the error measure for this example is minimal
In order to find the point with the minimal error –> find the minum of the error function
We can find the minimum with gradient descent –> making small steps “downhill”
To compute this, we need a continuous and differentiable
Error Minimization activation function g
What are multilayer perceptrons?
Perceptrons may have multiple output nodes
The output nodes may be combined with other perceptrons
Therefore there is one input layer, one or several hidden layers and one output layer
Information flow is unidirectional
Information is distributed
Information processing is parallel
How does backpropagation generate error signals for the intermediate layers?
The output nodes are trained like a normal perceptron –> minimize error function
Δi is the error term of output node i times the derivation of its inputs
the error term Δi of the output layers is propagated back to the hidden layer and so forth
Thus the information provided by the gradient flows backwards through the network –> backpropagation
What is overfitting?
Overfitting means the scenario, where :
Training Set Error continues to decrease with increasing number of training examples / number of epochs
Test Set Error will start to increase
What is Deep Learning and what are the key ingredients?
In the last years, great success has been observed with training „deep“ neural networks
Successes in particular in image classification
Key ingredients:
big data
fast processing
unsupervised pre-training of layers
What is convolution of an image?
for each pixel of an image, a new feature is computed using a weighted combination of its nxn neighborhood
What are recurrent neural networks?
these kind of neural networks allow to process sequential data by feeding back the output as the new input
Long-Short term memory allow RNN to forget
Do neural networks always workout well?
Neural networks are good for problems where the final output depends on combinations of many input features
But they don’t perform well if if explicit representations of the learned concept are needed
Neural networks are not always reliable, e.g. classifies a panda and if the image is blurred a gibbon
How is the design of a learning element affected?
by the components of the performance element are to be learned
by the feedback that is available to learn these components
by the representation that is used for the components
What does it mean if a funtion which models a certain hypothesis is consistent?
If such a function is consistent it means that it agrees with the underlying funtion f on all examples
How can we avoid overfitting?
Ockhams’s Razor –> The best explanation is the simplest explanation that fits the data
maximize a combination of consistency and simplicity
keep a seperate validation set (different to test or training set) and stop the training if the error goes up
How can the error of a networkd be measured?
The error is measured by squaring the error of the perceptron learning rule
What is the problem with a threshold Activation Function, when we want to find the minimal error and what function is helpful in such cases?
A threshold activation function is not differentiable
One can use the Sigmoid function instead
What are potential problems when minimizing the error by gradient descent?
The target function may have several local minima, therefore it is often very difficult to find the global minimum
What works on the training set may destroy the performance on the test set
What is the relation between deep learning and AI?
Machine learning is a subfield of AI and deep learning is a subfield of machine learning. Deep learning is a form of machine learning that makes use of artificial neural networks.
How to define Machine Learning?
(Mitchell 1997): „A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.“
What is overfitting and how can you avoid it?
Training Set Error continues to decrease with increasing number of training examples / number of epochs. An epoch is a complete pass through all training examples
Test Set Error will start to increase because of overfitting
Simple training protocol:
keep a separate validation set to watch the performance. Validation set is different from training and test sets!
stop training if error on validation set gets down
How can you optimize weights in a perceptron?
With the perceptron learing rule for supervised learning
What role does the activation function play in natural and in machine neural networks?
The activation function has a certain threshold, if that threshold is passed the neuron is activated, if not it remains unactive
It works the same for natural and machine neural networks
What are the limitations of a single perceptron?
A perceptron is a single linear classifier. In this case, they cannot solve tasks that are not linearly separable.
What is the typical structure of an artificial neural network?
The output nodes may be combined with another perceptron
The size of this hidden layer is determined manually
What are the key ideas of convolutional neural networks and recurrent neural networks?
Convolutional Neural Networks
Convolution:
Convolutions can be encoded as network layers
all possible 3x3 pixels of the input image are connected to the corresponding pixel in the next layer
Convolutional Layers are at the heart of Image Recognition
Several stacked on top of each other and parallel to each other
Recurrent Neural Networks (RNN)
allow to process sequential data by feeding back the output of the network into the next input
Long-Short Term Memory (LSTM)
add „forgetting“ to RNNs
good for mapping sequential input data into sequential output data, e.g.: text to text, or time series to time series
Last changed2 years ago