undefined

Buffl

KDD1

by Leonie P.

Major Clustering Approaches

spectral clustering

mean shift

Partitioning Algorithms: Basic Concept

Goal:

Construct a partitioning of a database D of n objects into a set of k (k <= n) clusters minimizing an objective function.

Exhaustively enumerating all possible partitionings into k sets in order to find the global minimum is too expensive.

k-Means Clustering

k-Means Clustering: Algorithm

k-Means: Voronoi Model for Convex Cluster Regions

For a given set of points P = {p1, . . . , pk } (here: cluster representatives), a Voronoi diagram partitions the data space into Voronoi cells, one cell per point
The cell of a point pi covers all points in the data space for which p is the nearest neighbors among the points from P

k-Means: Discussion

k-Means variants

median: ordinal

mode: categorical

medoid: metric (=PAM)

K-Means/Median/Mode/Medoid Clustering: Discussion

Initialization of Partitioning Clustering Methods

Choice of the Parameter k

how to measure the quality of a clustering?

measure should not be monotonic over k
The measures for the compactness of a clustering SSE and TD are monotonously decreasing with increasing value of k

—> Silhouette-Coecient

Quality measure for k-means or k-medoid clusterings that is not monotonic over k.

The Silhouette Coecient

reading the silouhette coefficient

Probabilistic Model-Based Methods

EM- clustering model

Iterative Optimization

EM: Turning the Soft Clustering into a Partitioning

EM: Discussion

find parameter k

If k is too high, the mixture may overfit the data
If k is too low, the mixture may not be flexible enough to approximate the data

Mean shift

Iterative Mode Search

Mean Shift: Core Algorithm

Apply iterative mode search for each data point. Group those that converge to the same mode (called Basin of Attraction).

Mean Shift: Extensions

Weighted Mean

Use different weights for the points in the window, with weights wx , resp. calculated by some kernel

Binning

First quantise data points to grid. Apply iterative mode seeking only once per bin.

Mean Shift: Discussion

Density-Based Clustering

DBSCAN

Density-Reachable vs Density-Connected

DBSCAN Algorithm

Density-Based Spatial Clustering of Applications with Noise

with

Evaluation: based on recursive database traversal.

DBSCAN - find parameters

DBSCAN: Discussion

Spectral Clustering

Find global minimum cut

General Steps

make graph from data
Create adjacency Matrix
Caluclate Eigenvalues and eigenvectors
choose useful number of eigenvalues
apply k means on eingenvectors
map results back to orginal data

Spectral Clustering : Preliminaries

Spectral Clustering: Discussion

Hierarchical Clustering: Basic Notions

result: dendrogramm
Dendrogram can be constructed
- bottom-up (agglomerative approach) or
- top down (divisive approach)

Link Methods - agglomerative clustering

single, complete, and average

Divisive Hierarchical Clustering

Discussion Agglomerative vs. Divisive HC

Density-Based Hierarchical Clustering

Core Distance and Reachability Distance

OPTICS

OPTICS: The Reachability Plot

Hierarchical Clustering: Discussion

Cluster Evaluation - external

Sc : cluster label

SB: ground truth cluster label

Cluster Evaluation external measures

external: needs ground truth

recall, prec, f1, confusion matrix
rand index, Adjusted Rand Index (ARI), Jaccard Coecient
(Shannon) Entropy, Conditional Entropy:
Mutual Information, Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI)

Cluser evaluation - internal measures

no additional information needed

Cohesion
- Average distance between objects of the same cluster
- Cohesion of clustering is equal to weighted mean of the clusters’
  cohesions.
seperation
- Separation between to clusters: Average distance between pairs
- Separation of clustering is equal to weighted mean of the clusters’ separations.

Join Course

Preview

Author

Leonie P.

Information

Last changed
2 years ago

Report course

4-1

Author

Leonie P.

Information