What is the generalization error?
The generalization error is the expected loss on future data for a given model
What is the formula for the generalization error?
What is the formula for the empirical risk?
What is a common assumption when estimating the generalization error via the empirical risk?
That the data is independent and identically distributed (i.i.d)
Does a good estimation for the generalization error via the empirical risk require that the data is Gaussian distributed?
No, that does not have to be the case
Does a good estimation for the generalization error via the emprical risk require that there is a large number of samples?
Yes. A high number of samples is crucial for a good estimation of the generalization error
Does a good estimation for the generalization error require a differentiable loss function?
No, infect not. The 0-1 loss is not differentiable
Does estimating the risk via samples, which have not been used in training avoid the problem of underfitting?
No. Underfitting is a problem which results from a model which is to coarse to fit to the data
Does the model variance decrease with increasing complexity of the model?
No. The model variance increases with increasing complexity
Do higher degrees of freedom imply a lower risk for overfitting?
In fact not. The higher the degrees of freedom, the higher the risk of fitting to noise
Does supervised machine learning use explicit knowledge to design models deductively?
No. Supervised machine learning tries to design models inductively with given training data
Does the probabilistic model, the Optimal Bayes classifier predict the most probable outcome for a new sample and uses a loss function?
Yes. The Optimal Bayes Classifier makes the most probable prediction and uses e.g. the 0-1 loss
What indicates underfitting?
A large error on train and test set
Do ROC curves allow to evaluate classifiers independently of class distribution and misclassification cost?
Yes, they do. ROC are designed to assess a general performance of the discriminant function
Is it possible that cross validation helps to optimize hyperparameters?
Yes, cross validation is designed to support such actions
Does the Bias increase with increasing model complexity?
The Bias descreases with increasing model complexity
What does a very low k in the k-nearest-neighbor algorithm lead to?
It leads to overfitting
Wat does a too high k in the k-nearest-neighbor algorithm lead to?
It leads to underfitting, it would simply assign the class label of the more dominant class
Do high k’s in the k-nearest-neighbor algorithm lead to high complexity?
k is a hyperparameter and is therefore not corellated with the models complexity
Does k-nn work for unlabeled data?
k-nn only works for labeled data, but it works for classification as for regression
Do Kernels transform data into a lower dimensional space where separability can easier be achieved?
No, in fact Kernels project data into a higher dimensional space
What is the formula of the Gaussian Kernel
Does the constrained convex optimimization problem have a unique global solution?
Yes. This is one of the major advantages of SVMs
Do SVMs allow a probabilistic explanation for classification?
No, they don’t. They either classify as class 1 or -1 (in a binary classification setting)
When is a sample a support vector?
If and only if the corresponding Lagrange multiplier is greater 0
What are the different KKT conditions?
alpha1 * h1 = 0
alpha2 * h2 = 0
Result in four cases:
h1 < 0, alpha_1 = 0
alpha2 = 0, h2 < 0
alpha2 > 0, h_2 = 0
h1 = 0, alpha_1 > 0
When do SVMs work effectively?
If the number of dimensions is much larger than the number of samples
Do SVMs use the magnitude of the discriminant function for regression?
Yes. They use the sign and the magnitude, where SVMs in classification only use the sign of the discriminant function
How many classifiers are there in a multiclass classification setting of the one vs. the rest/all approach (SVMs)?
M(M - 1) / 2 classifiers
The classifier with the largest value is chosen
How many classifiers are there in a multiclass classification setting of the one vs. one approach (SVMs)?
There are M classes
The classifier with the majority vote is chosen
Do decision trees perform well on large datasets?
Yes. Since they narrow down the data at each decision step
Are decision trees only fitted for numerical or categorical data?
They can be fit to both types of data
When is a split maximizing the information gain?
If the produced subsets are homogenous
What do decision trees do recursively?
The recursively split the data into subsets
When is the minimal Gini impurity achieved?
When the produced subsets split into 1/M classes
What is information gain?
It is a splitting critirion for decision trees
Is crossvalidation a method to estimate the generalization error?
Yes. In fact, cross validation splits the training data into different batches and uses one as test dataset, while recursively training
Does too small model class complexity to underfitting?
Yes, this is very often the case
Does regularization increase the gap between training error and generalization error?
No. In fact regularization decreases the generalization error
Can the generalization error be computed exactly by using the test data?
No. Using the test data is just an estimate for the generalization error
Does increasing the value of C-SVMS cause the margin to shrink?
Yes. A high value for C means that slack is punished and therefore the margin shrinks
Do the training and test data set have to be disjoint?
Yes. Training and test data set should not have any sample in common
Does supervised machine learning use supervisory signals for predictive modeling?
Yes, supervised machine learning uses labeled training data to design models for future data
What is the form of the dual problem?
What is a common splitting criterion for regression?
Variance reduction
Are decision tree algrithms non-recursive?
In fact they use the recursive approach
What can a new input sample in decision trees in a classification task be assigned to?
A class value or a conditional probability
Do shallow trees tend to underfit and deep trees tend to overfit?
Yes. AS a deeper tree asks for more conditions
What is the form of the entropy (important for information gain)?
What is the form of the Gini purity?
What is the form of the primal problem?
What is the margin (SVM)?
It is the shortest distance between observations and the decision threshold
Zuletzt geändertvor einem Jahr