Specify the corresponding pattern-based (SGD) learning rule.
Name the most distinctive characteristics between the ADALINE model and the perceptron model.
• Perceptron generates binary output while ADALINE outputs and optimizes vector of reals.
• If the data is separable, Perceptron converges faster than ADALINE.
• ADALINE is used to approximate the separating hyperplane more than it is used to classify (this
could more easily be achieved using the Perceptron)
• the Perceptron uses class labels to learn model coefficients
• ADALINE uses continuous predicted values to learn model coefficients which is ’more’ powerful
since it tells us by ’how much’ we were right or wrong.
What advantages do pattern-based learning rules have?
(a) Efficiency: SGD is computationally faster, especially for large-scale datasets. Instead of computing
the sum of the gradients of all samples (like in GD), SGD only computes the gradient of one sample
(or a small batch of samples) at each iteration.
(b) Noise helps to avoid local minima: The noisy updates in SGD can help to avoid shallow local
minima and find a better (potentially global) minimum, as the noise can provide the necessary kick
to get out of the shallow minima.
(c) Online Learning: SGD allows for online learning. This means that it can handle new data on-
the-go, updating the model parameters as new data comes in. This is not possible with GD, which
requires the entire dataset to be available and static.
(d) Memory Usage: SGD uses less memory as it only needs to store a single or a batch of data points,
compared to GD which needs to store the entire dataset. This makes SGD a better choice for large-
scale datasets
Last changed18 days ago