Why is it important that machines are able to learn?

Learning is essential for unknown environments –> if designer does not have omniscience

Learning is useful as a system construction method –> expose the machine to reality, rather than trying to hardcode it

Learning modifies the machines decision mechanisms to

improve performance

What is learning?

learning is everything that comes trough experience and modifies the behavior

learning modifies the propability of a behavior in a certain situation

learning is a change of behavior that is not explainable with maturing, injuries, sickness or predisposition

learning makes useful changes in our mind

learning modifies representations of what is experienced

learning helps to do the same task more efficiently and effectively the next time

What is Mitchells definition for Machine Learning?

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.“

Initial position:

task T

performance measurement P

experience E with the task

Goal:

generalize the experience in a way that allows to improve your performance on the task

What are commonly used tasks for Machine Learning?

learning to play games, like Backgammon

recognizing spam mail

recognizing handwritten characters

learning to classify stars

market basket analysis

What are different machine learning scenarios?

supervised learning –> correct answers for each example

unsupervised learning –> correct answers are not given

reinforcement learning –> occasional rewards for good actions

What is inductive learning and how does it work?

inductive learning is the simplest form of learning meaning it learns a function from examples

Intitial state:

f is the (unknown) target function

An example is a pair (x, f(x))

Goal: find a hypothesis h

learn from given a training set of examples

such that h is approxamitely equal to f

Algorithm:

construct/adjust h to agree with f on training set

h should not be overfitting the training set

What is the pigeon experiment about?

In the pigeon experiment pigeons were trained to differentiate Grandmasters of art

If their choice was correct they got a reward

They got high accuracy on the training as well as on the test set –> pigeons really recognized patterns and different styles other than just memorizing it

This sort of learning is possible trough neural networks

How can performance of a hypothesis h be measured?

Use theorems of computational/statistical learning theory

try h on a new test set of examples where f is known

What are Neural Networks?

they model the brain and nervous system

they are based on very simple principles, but they behave in a very complex way

they can process information parallel

they have the ability to learn

applications:

problem solving

biological models

How does a biological neuron work and how can it be modelled?

neurons are a tiny unit in our brain and they are connected to other neurons via synapses

if a neuron is activated, it spreads its activation to all connected neurons

a artifivial neuron can be modelled in the following way:

it has inputs all with belonging weights

all different inputs sum up to the input function

the sum is then tested on the activation function

if they pass a certain threshold the neuron is activated, if not it remains neutral

according to that are the outputs adjusted

What is a perceptron and how does it predict, at last what is its realtion to boolean functions?

a perceptron is a single node in a modelled artificial neural network

it has a belonging activation function and a certain threshold which determines if it is activated or not

typical outputs are +1 and -1

a perceptron can also implement the logical functions and, or, not

but rather complex functions like xor cannot be modelled, because no linear separation is possible

What is the perceptron learing rule for unsupervised learning?

f(x) is the underlying target function

h(x) is the goal function/hypothesis

a is the learning rate

Wj is the weight of the corresponding input

xj is the corresponding input

How can we minimize the error?

The error function for one training example may be considered as a function in a multi-dimensional space

The best weight setting for one example is where the error measure for this example is minimal

In order to find the point with the minimal error –> find the minum of the error function

We can find the minimum with gradient descent –> making small steps “downhill”

To compute this, we need a continuous and differentiable

Error Minimization activation function g

What are multilayer perceptrons?

Perceptrons may have multiple output nodes

The output nodes may be combined with other perceptrons

Therefore there is one input layer, one or several hidden layers and one output layer

Information flow is unidirectional

Information is distributed

Information processing is parallel

How does backpropagation generate error signals for the intermediate layers?

The output nodes are trained like a normal perceptron –> minimize error function

Δi is the error term of output node i times the derivation of its inputs

the error term Δi of the output layers is propagated back to the hidden layer and so forth

Thus the information provided by the gradient flows backwards through the network –> backpropagation

What is overfitting?

Overfitting means the scenario, where :

Training Set Error continues to decrease with increasing number of training examples / number of epochs

Test Set Error will start to increase

What is Deep Learning and what are the key ingredients?

In the last years, great success has been observed with training „deep“ neural networks

Successes in particular in image classification

Key ingredients:

big data

fast processing

unsupervised pre-training of layers

What is convolution of an image?

for each pixel of an image, a new feature is computed using a weighted combination of its nxn neighborhood

What are recurrent neural networks?

these kind of neural networks allow to process sequential data by feeding back the output as the new input

Long-Short term memory allow RNN to forget

Do neural networks always workout well?

Neural networks are good for problems where the final output depends on combinations of many input features

But they don’t perform well if if explicit representations of the learned concept are needed

Neural networks are not always reliable, e.g. classifies a panda and if the image is blurred a gibbon

How is the design of a learning element affected?

by the components of the performance element are to be learned

by the feedback that is available to learn these components

by the representation that is used for the components

What does it mean if a funtion which models a certain hypothesis is consistent?

If such a function is consistent it means that it agrees with the underlying funtion f on all examples

How can we avoid overfitting?

Ockhams’s Razor –> The best explanation is the simplest explanation that fits the data

maximize a combination of consistency and simplicity

keep a seperate validation set (different to test or training set) and stop the training if the error goes up

How can the error of a networkd be measured?

The error is measured by squaring the error of the perceptron learning rule

What is the problem with a threshold Activation Function, when we want to find the minimal error and what function is helpful in such cases?

A threshold activation function is not differentiable

One can use the Sigmoid function instead

What are potential problems when minimizing the error by gradient descent?

The target function may have several local minima, therefore it is often very difficult to find the global minimum

What works on the training set may destroy the performance on the test set

What is the relation between deep learning and AI?

Machine learning is a subfield of AI and deep learning is a subfield of machine learning. Deep learning is a form of machine learning that makes use of artificial neural networks.

How to define Machine Learning?

(Mitchell 1997): „A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.“

What is overfitting and how can you avoid it?

Training Set Error continues to decrease with increasing number of training examples / number of epochs. An epoch is a complete pass through all training examples

Test Set Error will start to increase because of overfitting

Simple training protocol:

keep a separate validation set to watch the performance. Validation set is different from training and test sets!

stop training if error on validation set gets down

How can you optimize weights in a perceptron?

With the perceptron learing rule for supervised learning

What role does the activation function play in natural and in machine neural networks?

The activation function has a certain threshold, if that threshold is passed the neuron is activated, if not it remains unactive

It works the same for natural and machine neural networks

What are the limitations of a single perceptron?

A perceptron is a single linear classifier. In this case, they cannot solve tasks that are not linearly separable.

What is the typical structure of an artificial neural network?

The output nodes may be combined with another perceptron

The size of this hidden layer is determined manually

What are the key ideas of convolutional neural networks and recurrent neural networks?

Convolutional Neural Networks

Convolution:

Convolutions can be encoded as network layers

all possible 3x3 pixels of the input image are connected to the corresponding pixel in the next layer

Convolutional Layers are at the heart of Image Recognition

Several stacked on top of each other and parallel to each other

Recurrent Neural Networks (RNN)

allow to process sequential data by feeding back the output of the network into the next input

Long-Short Term Memory (LSTM)

add „forgetting“ to RNNs

good for mapping sequential input data into sequential output data, e.g.: text to text, or time series to time series

Last changed5 months ago