Why do we need unsupervised methods?
e.g. training data may include attacks… (we dont know, otherwise we would have labeled data…)
=> accomodate outliers in training data…
What is the basic idea oc OC-SVM?
has parameter v -> represents percentage of outliers in training data
-> model ignores the v percent of data that are most anomaloius (with respect to the rest of the training data)
=> thus can learn model for “normal” even from tainted training data…
What is a hyperplane?
divides n-dimensional space
-> in 2-dimensions: gerade…
=> can be used to separate datapoints in n-dimensional feature space…
How to evaluate points versus hyperplane?
on same side
-> have same sign…
What is generally the goal of SVM?
given: linearily separable data set x_i with corresponding labels y_i
=> find hyperplane that all training samples lie on the correct side…
What are problems with SVM?
how to find parameters of hyperplane?
uniqueness: are there other solutions (i.e. other parameters)
why should this be optimal? (as there are other possiblitieis if not unique…)
How to solve the uniqueness and finding parameters problems in SVM? How is this called?
find hyperplane with sampels on “correct” side
-> such that two-norm of w is minimized
=> SVM hard margin
How do we mathemtically find the SVM hard margin? What problem is still open?
instead of minimizing two norm of w
-> 1/2 * squared two norm of w
=> can be solved via quadratic programming…
=> results in w,b which determine the SVM
=> SVM(x) -> sgn(w*x+b)
-> optimality?
How do we solve the optimality problem in SVM?
maximize distanece of hyperplane to dataplanes
-> minimize two norm of w
=> but as max distance
What is the interpretation of the two norm?
the length of an vector
What is SVM soft margin used for?
data can no be linearily separated
-> use hinge loss
-> sum over i
(max(0,1 - y_i(w*x_i-b))
How do we calculate SVM soft margin?
solve arg min on w,b
for some constant lambda with
arg min [
1/n (where n is amount of data samples)
sum over i = 1 to n (
max(
0
1-y_i(w*x_i-b)
)
]
+
lambda * ||w||^2
What is the effect of lambda in SVM?
trades off “margin size” and “placing x_i on correct side””
-> balances bias and variance
How can one also solve SVM where data not linear separable?
project in higher dimensional space
-> where it may be linearily separable
=> using a feature map
-> there exists lots of different featuere maps… (e.g. (x1, x2, (x1^2 + x2^2))
-> e.g. linear, polynomiyal, RBF
What is a feature map?
a function
φ : R^J → R^K
where J < K
What is a kernel function?
k(x_i, x_j)
=
dit product (φ(x_i), φ(x_j))
What is the RBF kernel function?
k_rbf (x_i, x_j)
e^( −γ ||x_i - x_j||^2)
with γ > 0
How can we use SVM for one class labels? (i.e. only positive class)
points on the inside have label +1
points on the outside have -1
=> construct hyper plane between data and origin (i.e [0,0…,0])
What is a problem if we use the regular hinge loss minimization for OC-SVM?
due to definition of y = 1, the term shortens to sth
-> can result in unwanted properties (loss = 0 regardeless fo hyperplane…)
=> have to introduce -b to loss…
=> have closer look at math properties of this stuff…
What is k-NN, what can it be used for?
supervised learning technique for both regression and classification
can also be used for unsupervised anomaly detection
-> single hyperparameter k, number of neighbors
What are properties of the kNN algorithm?
classify new point (x,y) based on neighborhood
-> namely its k nearest neighbors
compute distancec to all other points
select k nearest instances
determine class by majority vote
=> no training phase…
Zuletzt geändertvor 2 Jahren