04_Classification

Buffl

KI Fahrzeugtechni

by Jensen J.

What is classification in general?

systematic arrangement in groups or categories
according to established criteria

What are patternsß

distinctive structures in the data
-> can basically be found everywhere where data (or only metadata) is available

What can patterns be w.r.t. the different ML methods?

indivcate a trend (regression)
refer to certain known type (classification)
indicate similarieties between different data points (clustering)

On what ability is pattern recognitnion based?

ability to generate abstract rules for patterns
from large amount of data

What information can be gained from patterns?

inforamtion about data structure (clustering)
forecasts (regression)
categorize new data (classification)

What is the general approach to classiciation?

have data (features)
pour into classifier
-> result in classes

What is the advantage of ML to classical methods (e.g. decision tree)?

decision tree requires a-priori knowledge to formulate classificaiotn rules

advantages ML:

automatic generation of a-priori knowledge (feature learning vs feature definition)
automatic generation of complex classificaiotn rules
=> suitable for extreme large datasets

What is an example classificaiton can be used for in e.g. videos?

object classification
-> object detection
-> object tracking

What is the formal definition for a classifier?

Classifier C
Model M with parameters Theta
Input: Featuere Space X = [x1, x2, …, xN]
Output: (Predicted) Label / Classes Y
Dataset : D = {(X1, Y1), (X2, Y1),…}
Training
Classifiaction

How is training formally defined?

given training dataset O TM D with known Labels
optimize model parameters Theta by minimizing prediciotn errors

How is classification defined?

evaluate C_M(theta) at (unknown / new) element X
receive class prediction Y

How does supervised training work?

How can a classifier be interpreted in the feature space?

decision boundaries in the feature space
separating the different classes

What is required for supervised learning?

labeled data

What is train test split?

usually apros 80/20 for train test
train dataset -> machine learns, optimizing its parameters
test dataset -> (unseen data) used to check quality of the model

What are some quality measures for classifiers?

classification accuracy or classification error
- loss function
compactness of the model
- decision tree size, number of decision rules
interpretability of the model
- insights and understanding of the data provided by the model
efficiency
- time to generate the model (training time)
- time to apply the model (prediction time)
scalability for large databases
- efficiency in disk-resident databases
robustness
- robust against noise or missing values

How can one validate classifiers?

k-fold cross validation
- decompose data set evenly into k subsets of (nearly) equal size
- iteratively use k-1 partitions as training data and remaining single partition as test data

What is stratified folds?

an additional requirement for k-fold cross validation
-> class distribution in training and test set should represent the class distribution in whole dataset (or at least in training set)
=> e.g. training set only cats, testing set only dogs -> not good…

What is the standard for cross validation?

10-fold stratified cross validation

Structure of Confusion Matrix

How to calculate Recall? Interpretation?

TP / (TP + FN)
freaction of test objects of class X, which have been identified correctly
=> share of specific class that was classified correctly

How to calcualte Precision? Interpretation?

TP / (TP + FP)
how much of the as class X classified objects are correct?

How to calculate Specifity? Interpretation?

TN / (TN + FP)
Other side of precision
-> how many of the classifications for different classes were correct?

What methods for classificaiotn exist? (examples)

decision trees
logistic regression
nearest neighbors
support vector machine
neural networks

How can liear regression be used for classification?

=> logistic regression
use linear regression to create decision boundary…

Formula Sigmoid Funciton

What are pros of logistic regression?

pros:

implementation: easy to use
probabilistic: probabiliy of an object being in a certain clsss
computation: quick training phase
insights: produces unterstandable models

What are cons of logistic regression?

cons:

linearity: hard to adopt to non-linear problems
overfitting: training data has to be well chosen

What is a problem of linear regression w.r.t. classification? How can it be handled?

maps continuous output to discrete values
-> blurs class boundaries (… which is most relevant element for classifiaction)

=> using sigmoid, S shapes steep flank makes it possible to build clear class boundary

What is the idea of nearest neighbor?

classify new object based on its nearest neighbor(s)

Featuers of nearest neighbors?

instance based (memory based) “learnign method” -> dataset is model…
=> no model is “generated”
process training data when new object should be classified -> “lazy evaluation”

=> easy to use but memory and time inefficient for large datasets

What variants are there for Nearest Neighbor?

NN classifier
- classify based on nearest neigbor only
k-NN classifier
- classify based on k nearest neighbors (k>1)
Weighted k-NN classifier
- classify based on k-NN, weighted by distance
Mean-based NN classifier
- classify based on distance to mean position of classes

What are some considerations in choosing k in k-NN?

generalization vs. overfitting
-> large k: many objects from different classes
-> small k: sensitivity against outliers
practice: 1 << k < 10

What are some considerations in weighted k-NN?

how to weight neighbors?
- distance to the neighbor?
  - weight_i = (1/distance_i^2)
- frequency of the neoghbor class
  - w_i = 1/frequency_i
  - => w.r.t. whole dataset
  - -> if rarer datapoints are near -> give them more weight…

What are pros of NN?

pros:

applicatbility: easy to calculate distances
accuracy: great results for many applications
incremental: easy adoption of new training data
robust: Copes with noise by averaging (k-NN)

What are cons of NN?

contra:

efficiency: processing grows with training data (O(n))
- can be reduced to O(log n) with an index structure (requires training phase)
dimensionality: not every dimension is relevant
- weight dimensions (scale axes)

What is a neutral consideration about NN?

does not produce explicit knowledge about classes

What are SVM?

Hyperplane separating the feature space
-> interpreted as decision boundary…

How does training and classification in SVM roughly work?

training: calculate hyperplane
classicitiaon: compute sign / distance w.r.t. hyperplane

What are maximum margin hyperplane SVMs ?

objective:
maximize distance to hyperplane
- distance to hyperplane at least delta (Margin) for all datapoints
high generalization
- maximal stable
small number of support vectors
- only depends on objects with distance delta

What is the general formal definition of SVM?

we have training data (points in hyperspace… our example 2d space…)
hyperplane xTX + b = 0
- with w : normalvector
- b/||w||: offset from origin
margin: delta = 1/||w||
training: minimize ||w||
classify new sample based on which side of the SVM it lies…

What is a problem with maximum margin SVM?

linear separation not always optimal or even possible
-> tradeoff between error amnd margin
solution: allow classifiction error to maximize margin (soft margin…)
- => some may lie on wrong side…

What is another solution when data not linearily separable?

e.g. too many errors with soft margin
-> use space transformatino
=> transform in higher dimensional space (sigma(X))
separaion with linear hyperplane wTsigma(X) + b = 0 in higher space
inverse transform yields non-linear hyperplane in original space…

example: quadratic transformation (x2 = x1^2)

-> hyperplane becomes polynomial of degree 2…

Workflow of SVM space transformaitno=

transform datapoints
- lower to higher dimension
- computational complex
SVM training -> get hyperplane
transform hyperplane
- higher to lower dimension
- computational complex

What are SVM kernel machines?

replacement of space transformatino
-> replace dot product (in training and classifiation) with non-linear kernel function
-> thus avoidign space transformatino

What are some kernel funcitons for the kernel trick?

Example SVM kernel trick

How does multi class SVM work?

two approaches:
1 vs rest
1 vs 1

How does 1 vs rest work=?

when evaluating new datapoint -> iteratively consider a signle class vs the rest classes
-> then calculate hyperplane and see where the new datapoint lies on this hyperplane (does it lie in the class? does it lie in the rest?)
=> thus, we can see in which class it lies and in which it doesnt…

how does 1 vs 1 work?

similar to 1 vs rest
-> but basically do it for each class pairs and disregard all datapoints of other classes for each pair
-> for each pair, evaluate to which class it would belong…

-> do majority vote…

Again, image 1 vs rest, 1 vs 1

Why use kernel trick?

transformation from and to higher psace is complex…

What are pros of SVM?

accuracy: high classification rate
effective: even when number of dimensions > number of samples
robust: low tendency to overfitting
compact models: “plane in space”…
versatile: applicability of different kernel funcitons

What are cons of SVM?

efficiency: long training phase
complexity: high implementaiton effort
black-box: hard to interpret models

What are some classification applications?

big data
- find patterns
- make data usable
image classification
- handwritten digits
- x-rays
music classification
- shazam
speech/language classification
- siri/alexa/echo
fault detection
- qualiuty control during production

What can classificaiton be used in automotive?

perception
- camera outputs pixel array
- -> classificatoin adds value to each pixel
- -> pixel segmentaiton
- -> object detection
- -> object tracking

What is the workflow of vehicle detectoin and tracking?

get training data
extract features from images
train classification model based on features
take one video frame and classify features of sub-images
merge clasified areas and create bounding box

What does the kernel trick allow?

allows to calculate relationships between two vectors in a higher dimensional space
without a full space transformation.

Pipeline detect and classify cars in video stream?

extract features from trianing dtaa
train classifier
apply sliding window
classify sub-images
merge classification result of sub-images
consider classifiaction of previous frames
output bounding box with label

Join Course

Preview

Author

Jensen J.

Information

Last changed
2 years ago

Report course