What are the automatino levels of cars?
What are neural networks?
universal approximators
How is the high-level structure of universal approximators?
What is the benefit of ANN?
often no explicit in / output relation is known
=> ANN can learn this by itself by using inputs with output labels
=> supervised learning
What does the universal approximation theorem say?
(even relatively simple) ANN can approximate any analytic functoin with arbitrary accuracy
!! does not proof that respective parameters and architecture can be found easily…
What is the simplest approximation?
linear regression
-> y = f(wx, b)
= sum over wixi + b
= w1x1 + w2x2 + … + b
Basic idea of ANN w.r.t. linear regression?
based on linear regression
-> derive more complex forms of regression
by introducing non-linearities…
How does a basic neuron in an ANN look like?
sum over dot product of weight vector and input vector plus bias
What is a forward pass?
calculate the output of the neuron / NN w.r.t. provided input
=> allow to calcualte loss for trianing sample by putting the result of the forward pass y and the label y_hat into the loss function
What is the basic optimiation problem of neuronal networks?
minimize the weights and biases with regards to the loss
=> basically: minimize the loss by adjusting the weights and biases…
What is gradient descend?
method to find optimal weights and biases based on loss
-> calcualte gradient of loss w.r.t. individual weights and biasses
=> tells the direction in which to adjust the weights (as gradient points in direction of steepesd ascend -> so has to be consiered negative…)
stop criteria:
loss threshold is met
of after N iterations
What is the basic approach of gradient descend?
calculate gradient vector
update weights with learning rate
How to calculate the individual gradients?
chain rule
-> take computaion graph
and go backwards
What are some pitfalls in gradient descent?
stuck in local minimum
vanishing gradient
stuck on plateau, as especially wth small learning rate -> gradient is too small to make progress…
oscillating
e.g. learning rate too high to jump from one side of the valley to the other without reaching minima -> use e.g. learnign rate decay
jumping out of minima
e.g. learning rate too high, or minima quite shallow…
How to overcome the limitaitons of linearity (enable classification)?
introduce non-linearity with activation funciton
-> y = f(wx+b)
How does ANN with identity function as actiavtion function behave?
same as linear regression
-> can also simply be left out… same result…
=> but all linear activation functions can be used for linear regression…
What is a problem with the step funciton?
differntiating yields either 0 or infinity…
=> not good for gradient descend…!
What fuinctoin is used instead of the step function?
sigmoid
-> looks similar but is continuously differentiable and never gets to 0 or 1 when differentiated (only infinitely close to it…)
=> well suited for binary classifiaction
What are potential downsides of sigmoid?
yields no discrete result -> rather a confidence score of classifficatoin
-> can be desidred but if not: simply put step function behidn it after training when doing actual classificaiton…
Why are sigmoid computationally efficient?
derivative is again sigmoid…
What other activation functions are there?
relu
tanh
What is an epoch?
has passed when all training vectors have been used once to update the weights
Can a loss get to 0 with sigmoid?
no never -> as sigmoid never gets to completely 1 or 0…
What is a limit of single neurons?
does not help with regression to non-linear separation
mainly used to introduce threshold between classes (in single neurons)
-> simple/dumb approach: introduce non-linearity in input…
=> requires to know a-priori properties -> thus would not need neural network at all as we know solution…
How to allow ANN to becoom universal approximators?
create networks of several neurons
-> input layer, hidden layers, output layer…
What e.g. cannot be solved with single neuron?
XOR problem
How can one solve XOR?
combining operators
How does solving XOR translates to ANN?
combine NAND and OR ANN…
What is MNIST dataset?
hand written digits…
What is hidden layer?
layer in which neurons not directly connected to input/output
Last changed2 years ago