How to calculate the distance of a gerade and point?
dot product of unit vector and value x
=> e.g. wx - b
-> simply put it in to calc the distance…
What is the effect of ||w|| on the distance?
it scales it…
=> would have to be normalized to have actual distance….
||w|| != 1 (to have “real” unit vector…)
What is a unit vector?
zeigt steigung von gerade
-> orthogonal dazu durch ursprung…
(verschiebung auf y-achse vernachlässigt…)
What is the goal of SVM?
divde datapoints to different regions in space using hyperplanes…
What is the mathematical approach to SVM?
minimize ||w|| for all datapoints…
=>
Why SVM minimize ||w||?
to allow for smaller ||w||
=> x must be further away (-> normalization result…)
=> bring datapoints as far away as possible…
What is a problem with solely minimizing ||w||?
what if not all data points are on the correct side? (and can not be brought to it by moving the hyperplane)=
What is the intention of SVM soft margin=
only consider closest datapoints to be relevant
-> disregard datapoints that low on wrong side…
=> using lambda as “trade-off factor”
How is SVM soft margin calcualted?
arg min (w,b)
[1/n
sum over all datapoints
max(0, 1-y_i(wx_i+b))]
+
lambda ||w||^2
disregard everything that is further away than a certain distance (by applying max(0, … -> where right hand site becomes negative after a specific distance))
control the maximization of ||w|| -> disregard more closer things
with the lambda ||w|| term added to the loss…
What is the result of choices for lamnda?
too large lambda
-> points further away remain important -> overfitting
too smal lamba
-> underfitting…
Whta is a drawback of SVM?
requires labeled dataset…
What is the idea behind OC-SVM?
only instances of one group
separate them from origin…
by maximizing margin from origin…
How to calc OCSVM what are the elements?
arg min (w, xi_i, b)
1/2 ||w||^2
-> extend “reach” as far as possible by minimizing w… (smaller w -> higher distance to data)
1/(n ny)
-> divide by number of datapoints to not have over influence on data…
-> ny -> additional factor element ]0,1] -> allows to reduce importance of datapoints being on correct side by overall reducing the error introduced by datapoints
-> ny represents amount of outliers tolerable…
sum over xi_i - b
xi_i is “slack factor”
-> replaces actual distance of of datapoint to hyperplane but is at least as large as it…
-> also introduce -b as penalty to not allow for “cheatng”
Difference OCSVM to SVM?
SVM:
semi supervised
data must be clean
density estimation
OC-SVM
unsupervised
allows for training data to contian outliers
requires fraction of outliers ny
derive from SVM
no density estimation but max-margin model
Last changed2 years ago