Beschriften Sie ein Neuron.
Nennen Sie zwei nichtlineare Aktivierungsfunktionen und deren Ableitung.
Charakteristika von High und Low Learning Rates
High learning rates
Reduces loss fast, but don't find minimum
Too high learning rates can even diverge training
Small learning rates
Take very long times to find minimum
Can get stuck at points with low gradients
Was sind Dynamic und Cyclical Learning rates?
Dynamic Learning rates
Idea: Gradually reduce learning rate (learning rate decay)
Stepwise decay, Linear decay, Cosine decay
Cyclical Learning rates
Idea: Use high learning rates to escape local minima. Use multiple cycles of decay
At the beginning of each cycle, the high learning rate is able to escape a possible local minimum
Prinzip Adam
Normalized running estimates of mean and variance of each component of gradient. Customer learning rate for each parameter.
Vorteil von ReLU ggü. Sigmoid und tangenshyperbolic.
Vanishing Gradients
Computational saving
Sparsity
Zuletzt geändertvor 2 Monaten