kNN and DBSCAN

by Jensen J.

what is kNN

how does the knn Algo work?

-> compute distance to ALL OTHER POINTS

Formulat for L_p distance?

=> pth root of sum of betrag ^p

Training and inference complexity of kNN?

How can one use kNN for regression?

How to use kNN for unsupervised anomaly detection?

given: unlabeled dataset (xi) -> no labels yi
for all points xi -> calculate MEAN distance to nearest k neighbors
define some trheshold C based on this…
- => all xi for which the mean distance is smaller are normal
- => all points for whcih distance is >= C are anomalies…

What is DBSCAN?

How does dbscan work?

=> cluster all data into these groups to find outliers… (which thus are not in these clusters…)

What are advantages and disadvantages of DBSCAN?

Advantages:

Disadvantages:

not entriely deterministic
- border points reachable from two clusters may be part of either cluster…. -> depending of order of data processing…
curse of dimensionality

What are the effects of m and epsilon in dbscan?

m
- the higher, the less and small the clusters
- -> the more the number of outliers….
epsilon
- too smal -> data will not be clustered
- too high -> no outliers…

Rule of thumb:

m = 2* dimensionality ( 2 times the number of datapoints…)

What is a good way to estimate epsilon in DBSCAN?

What is a problem of methods relying on distance measures?

What solitnois are there for curse of dimensionality?

L1 only dampens but does not remove curse of dimensiontality
not really cosine loss similarity as no triangle inequality…!
parctical solution:
- dimensionality reduction via
  - autoencoders
  - PCA
  - feature selection…

Last changed
2 years ago