CRISP DM
all Phases
Business Understanding
Problem erkennen und analysieren
Descriptive (what is happening), Predictive (what will happen), Prescriptive (what should i do)
Data Understanding
Data collection
Data Probleme identifizieren
understanding Data, patterns, principles -> wisdom
Steps of Data Understanding
Collect initial data
explore the data -> Visualisieren
verify qualitiy
find outliers -> bild normal profile, detect anomalies, statistical model
Data Preparation
Data Consolidation -> Collect, select, integrate
Data Cleaning -> impute missing values, reduce noise, eliminate
Data Transformation -> normalize, aggregate, construct new attributes
Data Reduction -> reduce variables, cases and balance skewed data
Modeling
select modelling technique
generate test design
build model
assess model (rank the models)
Evaluation
Evaluate Results
Review Process
Determince next steps -> move on or new data
Deployment
Plan Deployment
Plan Monitoring and Maintenance
Produce Final Report
Review Project
Repeatable data mining process implementation
Taxonomy of Data
structured
unstructured
Supervised vs unsupervised learning
Supervised learning
ML task of learning a function that maps an input to an output based on example -> ML braucht Beispiele um daraus zu lernen
Prediction
Classification
Regression
Unsupervised learning
Algorithm that learns patterns fron unlabled data -> ML bildet sich eigene Rückschlüsse
Association
Link analysis
Sequence Analysis
Clustering
Outlier Analysis
How to train the model?
k fold cross validation
Asses Model
Interpret the models according to domain knowledge, the data mining success criteria and the desired test design
Judge the success of the application of modeling and discovery techniques more technically
Only consider models whereas the evaluation phase also takes into account all other results that were produced in the course of the project
what are the formula for accuracy, recall, precision, true NegativeRate
Zuletzt geändertvor 6 Monaten