What is the basic principle of machine learning?
optimize a function based on parameters
during learning process parameters are adjusted
function is then used for predictions on new data
What are regression and classification models?
regression models return a number
classification models return the class of a data point
What is k-fold cross-validation?
testing for how good a trained model will generalize to independent data
original sample divided into k equal sized samples
one subsample used as validation sample, rest as training data
result of training compared to validation set
repeat k-times, so that every subsample is used as validation data
Name supervised and unsupervised machine learning types and advantages for each group.
supervised - normally best performing classifiers, but requires labelled data
Support vector machines (SVM)
Random forest (RF)
Neural networks (NN)
unsupervised - groups data based on similarity
Clustering algorithms
k-means
Mixture of Gaussians
What is the silhouette index?
determine quality of clusters
s(i) = a(i) * b(i) / max( a(i), b(i) )
a(i) - mean distance of the data point to other members in the cluster
b(i) - min distance to a data point of another cluster
What is a very simple way to generate more data?
data augmentation
translation, scaling, rotation of data -> images
very cheap
but: How many augmentations are useful? -> overfitting
Last changed15 hours ago