How to clean data
Deletion / Omission
Imputing
How to detext Outlier
rule based
source: Business / Data understanding
Gießen is the same as Geißen
replace(data$spalte, dat$spalte == “Geißen”, “Gießen”) <- ändern
dat[dat$fixation> min.length, ] <- Filtern
cluster based
source: unsupervised learning
build clusters
look for points far away from cluster
DBSCAN / KMEANS
regression based
supvervised learning
Regression model
look for high residuals
special: everything outside of 𝜇 +- 3𝜎
least square model
DBSCAN
usefull to find outliers
KMEANS
unsupervised
Least square model
supervised
Last changed4 months ago