Perceptron & Adaline

What advantages do pattern-based learning rules have?

(a) Efficiency: SGD is computationally faster, especially for large-scale datasets. Instead of computing

the sum of the gradients of all samples (like in GD), SGD only computes the gradient of one sample

(or a small batch of samples) at each iteration.

(b) Noise helps to avoid local minima: The noisy updates in SGD can help to avoid shallow local

minima and find a better (potentially global) minimum, as the noise can provide the necessary kick

to get out of the shallow minima.

the-go, updating the model parameters as new data comes in. This is not possible with GD, which

requires the entire dataset to be available and static.

(d) Memory Usage: SGD uses less memory as it only needs to store a single or a batch of data points,

compared to GD which needs to store the entire dataset. This makes SGD a better choice for large-

scale datasets