Key Requirements of Data Mining for trust, efficiency and correctness? And how to achieve it?
Reproducibility
Verifiability: Can resuls be verified?
=> Formal process with comprehensive documentation to remove some of the “arts” stuff
Fayyads KKD Process?
Fayyads Knowledge Dicovery in Databases Process:
Framework for the process of dicovering knowledge from data
-> Data Mining is therefore one major key activity in the KDD process
Consists of Applying data analysis and discovery algorithms to discover patterns, relationships and other knowledge from prepocessed and transformed data
Data Selection
Preprocessing
Transforamtion
Data Mining Interpretation
Gain of knowledge
SEMMA?
Sample, Explore, Modify, Model and Assess Process:
-> Similar to Fayyad: Focus only on core data mining processes
Sample: Selection of dataset for modeling
Explore: Understanding of the data (incl. visualization)
Modify: Preparation for data modeling (Preprocessing, Cleaning, Variable selection, etc.)
Model: Application fo various modeling (data mining) techniques
Assess: Evaluation of modeling results
CRISP-DM?
Cross-Industry Standard Process for Data Mining:
Business Unterstanding (Business Objectives, Assessment of Situation, Data Mining Goalds and Project Plan)
Data Understanding (Collection of initial data, Data Description, Data Exploration, Verify Data Quality)
Data Preparation (Data Selection, Data Cleaning, Data Construction, Integration of Data, Formatting of Data)
Modeling (Selection of Modeling Technique, Generation of Test Design, Building of model(s), Assessment of model)
Evaluation (Evaluation of Results
-> Business Success Criteria, Review of Process, Determination of next Steps)
Deployment (Planning of Deployment, Planning of Monitoring and Maintenance, Production of Final Report, Review of Project)
ASUM-DM?
Analytics Solutions Unified Method for Data Mining/ Predictive Analytics
-> Extension of CRISP-DM
-> Additionaly inclues Infrastructure/ operations aspects, Project Management, Detailed Deployment
What steps does ASUM-DM include for the deployment?
Create Production Data Files: Load all data needed for operation into production environment
Create and Perform Operational Readiness Testing
Migrate/ Restore QU Model into Production
ISO 22989?
Standard for Data Mining which includes:
Terminologies and Definitions
Machine Learning Concepts and algorithms
Most important and generalized Stakeholder roles
AI System Lifecycle (+ Re-evaulation and Retirement of AI Systems)
Which phases listed in CRISP-DM, were missing in Fayyads KDD process?
Business Understanding
Testing in the Modeling phase
Categories and examples for EU AI Regulation for risk classification?
Unacceptable risk: forbidden
Significant potential to manipulate persons, real-time biometric identification in public space, Social-Scoring
High Risk: Compliance required
Biometric identificationa and categorization of natural persons, management and operation of critical infrastructure, education, law enforcement
‘Transparency Risk‘: Information and Transparency obligations
Minimal or no risk: Permitted
Types of Biases?
User-to-Data
Data-to-Algorithm
Algorithm-to-User
Human cognitive biases
Data biases
Defintion Attribute?
Measurement/ description fo an instance
Attribute Types?
Nominal: Distinct labels from a defined vocabulary
Ordinal: Impose an order on discrete categories (can be distinct labels from a defined vocabulary, numeric or strings)
Interval: Ordered elements with fixed distance in-between (can be distinct labels from a defined vocabulary, numberic (discrete or continiuous) or strings)
Ratio: Continiuous or zero-defined
Last changed2 years ago