What is data governance?
it ensures
quality (accurate, complete)
integrity (consistent, trustworthy)
security (protected, controlled)
usablity (Accesible, documented)
of an organization’s data
-> spans the entire data lifecycle: from collection to deletion or archiving
The Data Lifecycle
Governance applies at every stage
Data Governance vs Data Management
Data Governance (decision rights & accountability)
traffic laws & zoning regulations
who can take what action
upon what data
using what methods
Data Management (Operations & Implementation)
the trucks, warehouses, logistic operations
the execution of collecting, processing, and using data effectively
Cost of poor data quality
15 - 25 % of revenue annually
Data Gov, Roles & Responsibilities
Data owner: business executive, accountable for data sets ploicies
Data steward: day to day management, implements ploicies
Data custodian: IT/ Technical role, storage & security, Access control
Every dataset should have a clearly identified owner
6 Dimensions of data quality
7 GDPR principles
Lawfulness & Transparency
Purpose Limitation
Data Minimization
Accuracy
Storage Limitation
Integrity & Confidentiality
Accountability
Accountability means you must prove compliance, not just claim it. This is why documentation and audit trails matter
GDPR Data Subject Rights
Response time: You must respond within 1 month.
Your database design must support these requests
GDPR penalties
Tier 1: up to 10M or 2% of global turnover
Tier 2: up to 20M or 4% of global turnover
EU AI Act
Risk based approach
Minimal (spam filters)
Limited Risk (Chatbots)
High Risk (credit scoring, hiring)
Unacceptable (social scoring)
Requirements for high risk AI Systems:
risk management system
high quality training datasets (bias-free, representative)
technical documentation & human oversight
From Data Governance to AI Governance
Additonal AI-specific concerns:
model behavior: is it fair? Explainable?
Training data: representative? Bias free?
deployment: Monitored? Auditable?
AI Risk Management Workflow
key AI Risk Categories
bias & Fairness: Discriminator outcomes
Data Leakage: PII in prompts/ outputs
Hallucination: Fabricated but credible outputs
Deep dive, AI Data Leakage
Type 1: privacy leakage
Type 2: ML Pipeline Leakage
Data exposed through AI Systems
Model “cheats” during training
sensitive data in prompts
PII in model outputs
model memorization of training data
target leakage
train-test contamination
Red flag: If your model performs “too well,” it might be cheating. Always split data before any pre-processing!
AI Governance Implementation Checklist
start small: pick one high-risk system and govern it well before scaling
Audit Query, find duplicates (Uniquness)
Audit Query 2: Check Consistency
Audit Query 3: Find Missing Fields (Completeness)
Audit Query 4: Check Data Validity
Audit Query 5: Detect Potential Bias
Aggregation for Governance: Summary
Last changed14 days ago