What is an anomaly?
rare item
-> which raises suspicion by differeing significantly from majoirity of data
What is a poiunt anomaly?
individual data instance
anomalous w.r.t. (global) rest of data
What is a contextual anomaly?
single datainstance that is anomalous
in specific (local) context but not otherwise
What is a collective anomaly?
collection of related data instances
What is the taxonomy on classification?
supervised
semi-supervised (only one class label)
unsupervised (no labels at all)
What do we need to do when we expect collective anomalies?
aggregate features over time to represent the context in the data…
e.g. sliding window…
What approaches are there to model “normality”?
threshold based
-> e.g. have count based features and define limits in which a datapoint is considered normal
fit distribution to the data
What are advantages and disadvantages of thresholding?
arbitrary threshold
yields only binary output, not probabilistic…
What is an advantage of fitting a distribuution to data for anomaly deteftion?
yields probabiltiy score
When is a function an probability density function?
f(x) >=0 for all x element R
integrate to 1 (neg infintiy to pos infiinity)
allowed:
there exist individual data points where f(x) > 1
When is a function a probability?
P(x) = 0 for any x element R
integration over whatever region always yields resuilt <= 1
When is a function a cumulative distribution funciton?
from negative to arbitrary x
-> integration always yields <= 1
=> contrary to probabiltiy: integrate from nev. infinity not some value…
What is a problem of naive fitting distribution to data?
which one to choose?
often not good fit…
What is KDE?
non parametric -> not explicitly specify which probab distribution to use
density estimation
How do we do KDE?
estimator f_h(x)
=
1/n
Sum over kernels verschoben to datapoint…
What are kernels? Why reguliting factor 1/n before usm?
Kernel -> non-negative cunction that integrates to 1
-> 1/n to scale the sume to also integrate to 1 (as we sum it up n times at different positions…)
What is often used as kernel? What is the formula=
normal distribution
K(x) =
1/ sqrt(2 pi)
e ^ minus
x^2 / 2
What is additionally introduced in KDE aside from summation over kernels?
bandwidth h
=> 1/(nh)
sum over datapoints
Kernel ((x-xi) / h)
=> smoothens fit to avoid overfitting
How to use bandwidth in KDE?
h >= 0
small h potentially results in overfitting (to high variance)
too high leads to underfitting (too much bias)
How to find out which h to use?
cross validation
-> input:
list of possible h
datapoints
=> for a given h
fit model using leave-one out
sum over results of the left out…
=> use h where average score is best…
How can we apply KDE to find anomaleis
put new instance in estimator
-> when liser than threshold => anomaly…
as threshold C can be very small
-> plot wiht log…
Last changed2 years ago