How to calculate the distance of a gerade and point?
dot product of unit vector and value x
=> e.g. wx - b
-> simply put it in to calc the distance…
What is the effect of ||w|| on the distance?
it scales it…
=> would have to be normalized to have actual distance….
||w|| != 1 (to have “real” unit vector…)
What is a unit vector?
zeigt steigung von gerade
-> orthogonal dazu durch ursprung…
(verschiebung auf y-achse vernachlässigt…)
What is the goal of SVM?
divde datapoints to different regions in space using hyperplanes…
What is the mathematical approach to SVM?
minimize ||w|| for all datapoints…
Why SVM minimize ||w||?
to allow for smaller ||w||
=> x must be further away (-> normalization result…)
=> bring datapoints as far away as possible…
What is a problem with solely minimizing ||w||?
what if not all data points are on the correct side? (and can not be brought to it by moving the hyperplane)=
What is the intention of SVM soft margin=
only consider closest datapoints to be relevant
-> disregard datapoints that low on wrong side…
=> using lambda as “trade-off factor”
How is SVM soft margin calcualted?
arg min (w,b)
sum over all datapoints
max(0, 1-y_i(wx_i+b))]
lambda ||w||^2
disregard everything that is further away than a certain distance (by applying max(0, … -> where right hand site becomes negative after a specific distance))
control the maximization of ||w|| -> disregard more closer things
with the lambda ||w|| term added to the loss…
What is the result of choices for lamnda?
too large lambda
-> points further away remain important -> overfitting
too smal lamba
-> underfitting…
Whta is a drawback of SVM?
requires labeled dataset…
What is the idea behind OC-SVM?
only instances of one group
separate them from origin…
by maximizing margin from origin…
How to calc OCSVM what are the elements?
arg min (w, xi_i, b)
1/2 ||w||^2
-> extend “reach” as far as possible by minimizing w… (smaller w -> higher distance to data)
1/(n ny)
-> divide by number of datapoints to not have over influence on data…
-> ny -> additional factor element ]0,1] -> allows to reduce importance of datapoints being on correct side by overall reducing the error introduced by datapoints
-> ny represents amount of outliers tolerable…
sum over xi_i - b
xi_i is “slack factor”
-> replaces actual distance of of datapoint to hyperplane but is at least as large as it…
-> also introduce -b as penalty to not allow for “cheatng”
Difference OCSVM to SVM?
semi supervised
data must be clean
density estimation
allows for training data to contian outliers
requires fraction of outliers ny
derive from SVM
no density estimation but max-margin model
Zuletzt geändertvor 2 Jahren