What is classification in general?
systematic arrangement in groups or categories
according to established criteria
What are patternsß
distinctive structures in the data
-> can basically be found everywhere where data (or only metadata) is available
What can patterns be w.r.t. the different ML methods?
indivcate a trend (regression)
refer to certain known type (classification)
indicate similarieties between different data points (clustering)
On what ability is pattern recognitnion based?
ability to generate abstract rules for patterns
from large amount of data
What information can be gained from patterns?
inforamtion about data structure (clustering)
forecasts (regression)
categorize new data (classification)
What is the general approach to classiciation?
have data (features)
pour into classifier
-> result in classes
What is the advantage of ML to classical methods (e.g. decision tree)?
decision tree requires a-priori knowledge to formulate classificaiotn rules
advantages ML:
automatic generation of a-priori knowledge (feature learning vs feature definition)
automatic generation of complex classificaiotn rules
=> suitable for extreme large datasets
What is an example classificaiton can be used for in e.g. videos?
object classification
-> object detection
-> object tracking
What is the formal definition for a classifier?
Classifier C
Model M with parameters Theta
Input: Featuere Space X = [x1, x2, …, xN]
Output: (Predicted) Label / Classes Y
Dataset : D = {(X1, Y1), (X2, Y1),…}
Training
Classifiaction
How is training formally defined?
given training dataset O TM D with known Labels
optimize model parameters Theta by minimizing prediciotn errors
How is classification defined?
evaluate C_M(theta) at (unknown / new) element X
receive class prediction Y
How does supervised training work?
How can a classifier be interpreted in the feature space?
decision boundaries in the feature space
separating the different classes
What is required for supervised learning?
labeled data
What is train test split?
usually apros 80/20 for train test
train dataset -> machine learns, optimizing its parameters
test dataset -> (unseen data) used to check quality of the model
What are some quality measures for classifiers?
classification accuracy or classification error
loss function
compactness of the model
decision tree size, number of decision rules
interpretability of the model
insights and understanding of the data provided by the model
efficiency
time to generate the model (training time)
time to apply the model (prediction time)
scalability for large databases
efficiency in disk-resident databases
robustness
robust against noise or missing values
How can one validate classifiers?
k-fold cross validation
decompose data set evenly into k subsets of (nearly) equal size
iteratively use k-1 partitions as training data and remaining single partition as test data
What is stratified folds?
an additional requirement for k-fold cross validation
-> class distribution in training and test set should represent the class distribution in whole dataset (or at least in training set)
=> e.g. training set only cats, testing set only dogs -> not good…
What is the standard for cross validation?
10-fold stratified cross validation
Structure of Confusion Matrix
How to calculate Recall? Interpretation?
TP / (TP + FN)
freaction of test objects of class X, which have been identified correctly
=> share of specific class that was classified correctly
How to calcualte Precision? Interpretation?
TP / (TP + FP)
how much of the as class X classified objects are correct?
How to calculate Specifity? Interpretation?
TN / (TN + FP)
Other side of precision
-> how many of the classifications for different classes were correct?
What methods for classificaiotn exist? (examples)
decision trees
logistic regression
nearest neighbors
support vector machine
neural networks
How can liear regression be used for classification?
=> logistic regression
use linear regression to create decision boundary…
Formula Sigmoid Funciton
What are pros of logistic regression?
pros:
implementation: easy to use
probabilistic: probabiliy of an object being in a certain clsss
computation: quick training phase
insights: produces unterstandable models
What are cons of logistic regression?
cons:
linearity: hard to adopt to non-linear problems
overfitting: training data has to be well chosen
What is a problem of linear regression w.r.t. classification? How can it be handled?
maps continuous output to discrete values
-> blurs class boundaries (… which is most relevant element for classifiaction)
=> using sigmoid, S shapes steep flank makes it possible to build clear class boundary
What is the idea of nearest neighbor?
classify new object based on its nearest neighbor(s)
Featuers of nearest neighbors?
instance based (memory based) “learnign method” -> dataset is model…
=> no model is “generated”
process training data when new object should be classified -> “lazy evaluation”
=> easy to use but memory and time inefficient for large datasets
What variants are there for Nearest Neighbor?
NN classifier
classify based on nearest neigbor only
k-NN classifier
classify based on k nearest neighbors (k>1)
Weighted k-NN classifier
classify based on k-NN, weighted by distance
Mean-based NN classifier
classify based on distance to mean position of classes
What are some considerations in choosing k in k-NN?
generalization vs. overfitting
-> large k: many objects from different classes
-> small k: sensitivity against outliers
practice: 1 << k < 10
What are some considerations in weighted k-NN?
how to weight neighbors?
distance to the neighbor?
weight_i = (1/distance_i^2)
frequency of the neoghbor class
w_i = 1/frequency_i
=> w.r.t. whole dataset
-> if rarer datapoints are near -> give them more weight…
What are pros of NN?
applicatbility: easy to calculate distances
accuracy: great results for many applications
incremental: easy adoption of new training data
robust: Copes with noise by averaging (k-NN)
What are cons of NN?
contra:
efficiency: processing grows with training data (O(n))
can be reduced to O(log n) with an index structure (requires training phase)
dimensionality: not every dimension is relevant
weight dimensions (scale axes)
What is a neutral consideration about NN?
does not produce explicit knowledge about classes
What are SVM?
Hyperplane separating the feature space
-> interpreted as decision boundary…
How does training and classification in SVM roughly work?
training: calculate hyperplane
classicitiaon: compute sign / distance w.r.t. hyperplane
What are maximum margin hyperplane SVMs ?
objective:
maximize distance to hyperplane
distance to hyperplane at least delta (Margin) for all datapoints
high generalization
maximal stable
small number of support vectors
only depends on objects with distance delta
What is the general formal definition of SVM?
we have training data (points in hyperspace… our example 2d space…)
hyperplane xTX + b = 0
with w : normalvector
b/||w||: offset from origin
margin: delta = 1/||w||
training: minimize ||w||
classify new sample based on which side of the SVM it lies…
What is a problem with maximum margin SVM?
linear separation not always optimal or even possible
-> tradeoff between error amnd margin
solution: allow classifiction error to maximize margin (soft margin…)
=> some may lie on wrong side…
What is another solution when data not linearily separable?
e.g. too many errors with soft margin
-> use space transformatino
=> transform in higher dimensional space (sigma(X))
separaion with linear hyperplane wTsigma(X) + b = 0 in higher space
inverse transform yields non-linear hyperplane in original space…
example: quadratic transformation (x2 = x1^2)
-> hyperplane becomes polynomial of degree 2…
Workflow of SVM space transformaitno=
transform datapoints
lower to higher dimension
computational complex
SVM training -> get hyperplane
transform hyperplane
higher to lower dimension
What are SVM kernel machines?
replacement of space transformatino
-> replace dot product (in training and classifiation) with non-linear kernel function
-> thus avoidign space transformatino
What are some kernel funcitons for the kernel trick?
Example SVM kernel trick
How does multi class SVM work?
two approaches:
1 vs rest
1 vs 1
How does 1 vs rest work=?
when evaluating new datapoint -> iteratively consider a signle class vs the rest classes
-> then calculate hyperplane and see where the new datapoint lies on this hyperplane (does it lie in the class? does it lie in the rest?)
=> thus, we can see in which class it lies and in which it doesnt…
how does 1 vs 1 work?
similar to 1 vs rest
-> but basically do it for each class pairs and disregard all datapoints of other classes for each pair
-> for each pair, evaluate to which class it would belong…
-> do majority vote…
Again, image 1 vs rest, 1 vs 1
Why use kernel trick?
transformation from and to higher psace is complex…
What are pros of SVM?
accuracy: high classification rate
effective: even when number of dimensions > number of samples
robust: low tendency to overfitting
compact models: “plane in space”…
versatile: applicability of different kernel funcitons
What are cons of SVM?
efficiency: long training phase
complexity: high implementaiton effort
black-box: hard to interpret models
What are some classification applications?
big data
find patterns
make data usable
image classification
handwritten digits
x-rays
music classification
shazam
speech/language classification
siri/alexa/echo
fault detection
qualiuty control during production
What can classificaiton be used in automotive?
perception
camera outputs pixel array
-> classificatoin adds value to each pixel
-> pixel segmentaiton
What is the workflow of vehicle detectoin and tracking?
get training data
extract features from images
train classification model based on features
take one video frame and classify features of sub-images
merge clasified areas and create bounding box
What does the kernel trick allow?
allows to calculate relationships between two vectors in a higher dimensional space
without a full space transformation.
Pipeline detect and classify cars in video stream?
extract features from trianing dtaa
train classifier
apply sliding window
classify sub-images
merge classification result of sub-images
consider classifiaction of previous frames
output bounding box with label
Zuletzt geändertvor 2 Jahren