Optimization and Backpropagation

by Mira T.

Forward pass

Backward pass

Gradient update rule

wrt W

Number of weights in a layer

#neurons * #inputchannels + #biases

What happens if the learning rate is too high / too low?

too high: model may not converge to an optimal solution or converge too fast to suboptimum
too low: model converges very slowly

Gradients for large training set?

Additive regularization: general goal

Additive regularization: L1 characteristics

Additive regularization: L2 characteristics

Additive regularization: L1 term

Additive regularization: L2 term

Last changed
2 years ago