Face books data warehouse storing
dataware house stores upwards of 800 petabytes
incoming daily rate of about 600 terabytes
Real world impact of data
Netflix’s recommendation engine saves $1B/ year from reduced churn
Amazon 35% of sales form recommendation system
spotify creates personalizes playlists for 500M+ users
Difference between guessing and knowing is data
Netflix how they store an process data
Storage
Processing
cloud-based (AWS)
Data Lakes (raw event logs)
Data Warehouses (structured)
distributed across regions
batch
real-time
ML Models
Key elements in a Data ecosystem
sources
storage
processing
Analytics
Evolution of data storage
Why the evolution?
more data
different data types
faster processing needs
lower costs
File Storage
Simplest form of data storage
What?
csv, Excel, JSON files
Local drivers or network shares
When to use?
small datasets
one-off analsis
data exchange between systems
Pros
Cons
Simple familiar
no setup required
no relationships between data
hard to query efficiently
doesn’t scale
Relational Data
Zuletzt geändertvor 14 Tagen