Introducing the Cloud Data Platform

Buffl

Cloud Data

by lennart M.

Aufbau einer Data Platform

Kombination aus Data WH, Lake und Hub

Was ist ein Data Warehouse

einzelne physische Maschine

beinhaltet:

ETL (extract, transform, load) data
storage, processing, SQL

Problem: physische Grenzen in der Anpassung an neue Herausforderungen

Die 5 V’s von Big Data und mit welchen Data Warehouses Probleme haben

Variety

Diversity of data types
Social Media, IOT and SAAS create new types of data
Data Warehouse only works with structured data
Warehouse limits use to built in processing language,
limits processing new type of data

unstructured data

no predefined structure
Video, Image, Social Media, etc

Volume

Warehouse couples storage and processing, limits amount scalibility and flexibilty
new hardware required, expensive

Velocity

wie schnell werden Daten generiert und können bewegt werden
WH works batch oriented: Daten werden in Gruppen verarbeitet anstatt in echtzeit
Prozesse verzögert, bis komplette Batch analysiert ist

Data Lakes zur Rettung?

Data Lake (Hadoop)

speichert non relationale data (NoSQL - not only SQL)

RDBMS - relational database management system

speichert Daten in Reihen und Spalten

NoSQL - not only SQL

speichert Daten in verschieden Modellen

Hadoop kompliziert zu managen

Cloud

Elastic resources:
- Amount of resources exactly as you wish
Modularity: − Storage and compute are separated in a cloud

Pay per use: − You only pay for what you use
Instant availability:
- Ordering and deploying a cloud service takes minutes

Data Lake - pros and cons

+ cost-effectively handles an almost unlimited variety, volume, and velocity of data

+ erlaubt echt-zeit analyse

- not organised

- ungoverned: keine standards und Richtlinien, die hohe Datenqualität sichern

—> meist WH gekoppelt mit Lake. WH nutzt governed Data. Lake für Exploration of Data (i.e. Data scientists)

cost effective, flexible, and capable of ingesting, integrating, transforming, and managing all the V’s

Ingestion Layer

Daten in die Plattform kriegen wie:

relational or NoSQL databases, file storage, or internal or third-party APIs

Storage Layer

muss viele und unterschiedliche Daten speichern könen
Cloud storage ist elastisch

Processing Layer

Data wird transformiert für einfache Verarbeitung:
- Schema Management
- Datensäuberung
- Datenvalidierung
- Produktion neuer Datenprodukte
Geschieht durch SQL oder andere languages (i.e: Apache Spark, Apache Beam, Apache Flink)

Serving Layer

Data Lake Zugang für andere Applikationen

Zusammenfassung

Join Course

Preview

Author

lennart M.

Information

Last changed
2 years ago

Report course