Buffl

Geo Data Science

JS
von Julia S.

What are the different types of Gradient Descent? Explain the differences between them.

There are three types of Gradient Descent. The difference between them is how much data they use to compute the gradient at each step.


Batch Gradient Descent:

  • Uses all the training data to compute the gradient before updating -> slow on large datasets, but scales well.

  • Each step is one epoch (=one complete pass through the entire dataset)


Stochastic Gradient Descent:

  • picks one random training data instance at every step and computes the Gradient on only that instance -> is faster, might have problems with blunders

  • jumps around a lot -> migth help escape local minima!, but ends up very close to minimum wothout reaching it

  • is good for huge datasets that don’t fit into memory as only one instance is needed for every step. Therefore, it only sees one epoch after a few steps, one says: if there are x training instances, the St. Grad. Descent has seen the epoch after x steps (even though that could not be true, because it picks at random!)

  • to lower the drawbacks of randomness: start training with large learning steps to escape local minima; gradually reduce learning steps to settle at global minimum. This is called simulated annealing


Mini-Batch Gradient Descent:

  • gradient is computed on small sets of random instaces, called mini-batches; algorithm doesn’t see all the data, therefore less problems with huge datasets

  • will be closer to minimum than Stochastic Gradient Descent

  • jumps around less -> harder to escape from local minima



Explain the principle of Support Vector Machine and the two classifications.

Support Vector Machine (SVM) is a non-linear model which uses kernels for binary classification, but it can also be used for linear regression and multi-class classification.

The idea is to create the largest margin possible between the class boundaries, and therefore, the best possible line/polynomial (=decision boundary) to seperate between the classes:

This decision doundary is determined by the data instances closest to the decision boundary which are called support vectors. The fitted line is equally influenced by both support vectors (margin should be the same size on either side).

There is hard margin classification and soft margin classification:

Hard Margin Classification determines that all training instances must be outside the margin and on the correct side, like on the left picture below. This makes this method sensitive to outliers:

Soft Margin Classification is more flexible and tries to find a good balance between a large margin and a limited number of margin violations. SMC uses a Hyperparameter C which determines a penalty for the cost function for each misclassifies training sample (C is small -> more misclassified samples allowed -> more regulation, can reduce overfitting):


Therefore, in the (linear) cost function, when a point is outside or on the margin, it doesn’t contribute to the cost; but when it’s in the margin, the cost increases with the distance of the point to the margin (the closer to decision boundary, the higher the cost is).


For non-linear problems, kernel functions need to be used which allow for a non-linear separation but without adding polynomial features. Examples are polynomial and Gaussian Radial Basis Function (RBF).

How does Feature Extraction for 3D Point Clouds work?

Before features can be extracted, a neighborhood must be defined.

When that is done, the point set is centered. Then, from the locations of the neighborhood a 3x3 covariance matrix (due to x,y,z coordinates) is computed which describes how points in a neighborhood are spread out in space - it shows how much the points vary along and between the axes. For example, if the points have a large variation in y, sigma_yy would show a bigger number.

An eigenvector of matrix C is a direction that doesn't change when C is applied to it — only its length is scaled. The eigenvalue tells you how much it is scaled.

Imagine the point cloud in a neighborhood forms a blob in space: The eigenvectors are the directions of the blob’s principal axes. The eigenvalues tell you how stretched the point cloud is along those directions. Basically it’s like fitting a 3D ellipsoid to the neighborhood. The axes of the ellipsoid are the eigenvectors and the length of these axes represent the eigenvalues.

You get 3 eigenvalues (l1: direction with most spread, l2: second-most variation, l3: least variation) & 3 orthogonal eigenvectors.

From the eigenvalues features can be extracted. For example, the linearity of the features is checked as well as the sphericity or the change of curvature (measures how curved the surface is). For this, the neighborhood must be defined in a meaningful way, for example: cylindrical neighborhoods avoid mixing vertical structures like trees with the ground. In general, this is why using multi type, multi scale neighborhoods is very useful.

Additionally, binning can be used which divides the 3D space into voxels and analyzes each voxel seperately in terms of mean height, number of points etc.


Author

Julia S.

Informationen

Zuletzt geändert