06 Classifiction / Supervised ML

Buffl

Business Intelligence TU Wien

by nils K.

What is the Error Rate in Decision Trees?

In decision trees, the error rate refers to the proportion of incorrect predictions made by the model.

What is here the lowest error rate in the shown splits?

You can calculate a “split “ in a decision tree in different ways, what are possible ways?

Error rate

 Information gain

 Gini impurity (Gini index)

 Variance reduction



Hohe Entropie = high unvertainty or low uncertanty?

high Entropie -> high uncertainty

How use it to split in a decision tree?

Calculate entropy before and after the split.

What is information gian in decision tree in the context of entropy?

Information Gain = Initial Entropie der Zielvariable - Bedingungsentropie der Zielvariable gegeben das Merkmal

—> Da wo IG am höchsten ist, wird getrennt

Question: What is overfitting in decision trees?

Overfitting in decision trees occurs when the model is too complex and is able to fit the noise in the training data, leading to poor performance on unseen data. Common causes include having too many branches or too deep of a tree, using too many features or variables.

Decision tree: What is pruning?

"Pruning" means removing nodes from a tree

after training has finished

Redundant nodes are removed, sometimes tree is remodeled

- Remove a non-leaf node from the tree
- Evaluate the performance without the pruned node

Stopping criteria are sometimes referred to as

"pre-pruning"

What are possible Stoppping criteria?

Stoppen, wenn die absolute Anzahl an Proben niedrig ist (unter einem Schwellenwert)
Stoppen, wenn die Entropie bereits relativ niedrig ist (unter einem Schwellenwert)
Stoppen, wenn der Informationsgewinn niedrig ist
Stoppen, wenn die Entscheidung zufällig sein könnte (Chi-Quadrat-Test)
Schwellenwerte sind Hyperparameter

How many levels of a decision tree are unterstanable ?

max. 7

What is a Support Vector Machine (SVM)?

type of supervised learning algorithm
can be used for classification and regression tasks.
SVMs arbeiten indem sie den Hyperplan finden, der die verschiedenen Klassen im Merkmalsraum am besten trennt, während gleichzeitig den Abstand zwischen den nächstgelegenen Datenpunkten der verschiedenen Klassen maximiert. Die Punkte, die dem Hyperplan am nächsten sind, werden Support Vectors genannt und haben eine Schlüsselfunktion bei der Bestimmung der Position des Hyperplans. SVMs werden häufig für hochdimensionalen Daten verwendet und können reguliert werden, um Überanpassung zu vermeiden.

Is lenear seperation always possibel?=

What is the solution if seperation is not possible or it would lead to badly gernalising model?

Soft margin

Hyper plane that splits “as cleanly as possible/desirable”

While maximising margin

How cna I still seperate if it is not possibel in my 2d way?

Linear separation works more likely in higher

dimensional space

Idea of SVMs – projection of data into higher

dimensional space

Add addition coordiante

Whats dis: SVM - Kernels

To overcome limitation of not linearly sperable data , the SVM algorithm can be extended by using a kernel trick. A kernel is a mathematical function that maps the input data into a higher dimensional space, where the data may be more easily separated. By using a kernel, the SVM algorithm can find a non-linear decision boundary that can separate the different classes in the data.

Name common Kernels

Quadratic Kernel

 Radial Basis Kernel

 General Polynomial Kernel (arbitrary degree)

 Linear Kernel (=no kernel)

(when could this be the optimal kernel?)

100

How does a multi-class SVM work? And what are possible strategies?

Multi-class SVM is an extension of the standard SVM algorithm that can handle multiple classes by training multiple binary classifiers. There are several strategies to perform multi-class classification with SVMs, including:

One vs. All
- Binary classifiers that distinguish between class i and the rest
For example, in a 3-class problem (A, B, C), three binary classifiers are trained: one to separate class A from classes B and C, one to separate class B from classes A and C, and one to separate class C from classes A and B. The final prediction is based on the classifier that has the highest confidence score.r
One vs. One
- Build binary classifier for each pair of classes
- Class with highest number of votes wins
For example, in a 3-class problem (A, B, C), three binary classifiers are trained: one to separate class A from class B, one to separate class A from class C, and one to separate class B from class C. The final prediction is based on a voting system, where the class that wins the majority of the binary classifiers is chosen.

What are properties of svm?

High classification accuracy

Linear kernels: Good for sparse, high dimensional data, e.g. text mining

Much research has been directed at SVM,

Implementation available in open-source software

What is k-nearest neighbor (k-NN) algorithm?

Finds the k number of closest training examples
Uses majority class or average value of those examples to predict for new data point
Distance metric (e.g. Euclidean) used to measure similarity
k is a hyperparameter that can be adjusted for performance

What are the advantages of lazy learning?

azy learning algorithms do not require a training phase, thus they are faster to execute.
Lazy learning algorithms are memory efficient as they only store the training dataset and not the model.
Lazy learning algorithms can handle large datasets with high dimensionality.

Whats the formular for prevision?

Precision

True Positive

False Positive

Formular for Recall? What is recall?

How much more we could have had

R = TP / (TP+FN)

True positive

False negativ

Why is Generalization an Issue?

Generalization is an issue in machine learning because a model that has been trained on a specific dataset may not perform well when applied to new, unseen data. This is because the model has learned patterns and relationships specific to the training dataset, which may not be representative of the population as a whole.

What is Data Augmentation

Create new input data from existing instances

- Adding noise

- Distorting data (scaling, sampling, rotating, …)

- Combining individual instances

Be aware of potential systematic bias

Only for training data (possibly for validation data)

What is k-fold Crossvalidation (k-CV)

Valdidation of model -> It involves dividing the data into k subsets (folds) and training the model k times, each time using a different subset as the validation set and the remaining subsets as the training set. The performance of the model is then averaged across all k iterations.

k-CV is useful to obtain a more robust estimate of the model performance.
k-CV is useful to use the whole dataset for training and validating.
k-CV is useful to avoid overfitting by rotating the validation set.

What is Bootstrapping?

A bootstrap sample is a random subset of the data

sample

The goal of bootstrapping is to estimate the distribution of a sample statistic by repeatedly drawing samples from the original dataset and calculating the statistic of interest for each sample

What is Statistical Significance Testing? (

Statistical Significance Testing is useful to determine whether an observed difference is real or due to chance.

Hwo to evaluate classification results? Overview?

precision / recall

- micro / macro performance

- baseline and bias

- k-fold cross-validation; leave-one-out

- statistical significance AND relevance

- human validation; consistency in evaluator decisions

- how fine-granular is the evaluation scale?

Join Course

Preview