undefined

Buffl

ML ITSecv

by Jensen J.

What is the definition of anomaly detection?

identification of rare items
-> which raise suspicion by differeing significantly form the majority of the data…

What are point anomalies?

individuzal data instance
that can be considered as anomalous with respect to rest of the data

What are collective anomalies?

collection of related instances is anomalous with respect to the entire data set

What are contextual anomalies?

data instance is anomalouzs in specific context
but not otherwise

Categorize different anomaly types w.r.t context and amount of data points

How do we have to process features to detect collective anomalies?

aggregate features over time
i.e. sliding window

How can we determine what is normal w..r.t. features?

threshold
choose some distribution and fit distributino to data
use kernel density estimation

What are drawbacks of threshold based modeling?

arbitrary
yields only binary yes / no answer
- would rather like probabilitsticc result

What are drawbacks of fitting a probability distribution to the data?

how/which to choose?
often not good fit.,..

What is the difference between a probabilitiy density function, a probability function and a cumulative distribution functino?

PDF:
- continuous representation of probabilty variable distribution
- integral equals 1
- individual probability equals 0
- all non-negative
PF:
- individual probabilites still 0
  - P(x) = 0
- Integral between two points is smaller equal 1
CDF:
- cumulate probabiltiy from right to left until we reach 1…
- -> integral P(z<=x) = integral from - infinity to x ; <= 1

What are the two characteristics of KDE?

non parametric
- we do not explicitly specify which probablity distribution to use
density estimation
- but we still use a probablity distributino (instead of naiver approach such as “remembering” normal)

How can we use kernels to approximate an unknown disrtibution?

draw n univariate (no vectors) samples (independently and identically)
use some kernel function and sum over these kernesl verschoben by the sample values normalized by the number of samples

What is the bandwidht used for in kernels?

to smoothen them
-> large bandwidth -> high degree of smoothing -> potential underfitting
-> small bandwidth -> low degree of smoothing -> jagged -> overfitting

How is the formula for the usual kernel we use (normaldistribution) with and without bandwidht factor?

with bandwidth

without bandwidth

What is the formula for the estimator?

How does cross validation work?

split train data into n parts
use n-1 for training and 1 for validation
change the one for validation each epoch

How can we use cross validation to find the best h?

we have list of hyperparameters h
we have our cross validation splits

for all hyperparameters
1. for all cross validation splits
  1. fit model on data without current validation split
  2. eval on split
return h where average validation score is best

-> do cross validation for each hyperparameter

-> return the h where we receive the average best validation score (across the different cross validation runs)

What time does KDE with cross validation take?

O(h*k)
-> for all h parameters, run k validaiton runs…

How do we use KDE to evaludat a new instance w.r.t. anomaly?

we put the value in our estimator
if it lies below a threshold C
=> it is an anomaly…

How can one improve visualization of KDE anomaly detection?

plot it on logarithmic scale (natural logarithm ln)
-> as usually threshold is very small…

Join Course

Preview

Author

Jensen J.

Information

Last changed
2 years ago

Report course

4. KDE

Author

Jensen J.

Information