undefined

Buffl

Big Data & Data Science

von Dominik P.

Big Data Analytics vs. Traditional Data Analytics

Big Data Analysis requires different tools and methodologies to deal with data Volume, Velocity and Variety.

However, it follows the same underlying process as that of traditional data analysis
- Prepare: data retrieval, cleansing and transformation
- Examine: data exploration and visualization
- Model: data mining and knowledge discovery
Both have the common goal of highlighting useful information, suggesting conclusions and supporting decision making.

Beschreibe Top-Down + Bottoms-Up

Nenne Big Data Analytics Strategy

Why Data Lakes?

Start with end-user requirements to identify desired reports and analysis
Define corresponding database schema and queries
Identify the required data sources
Create a Extract-Transform-Load (ETL) pipeline to extract required data (curation) and transform it to target schema (‘schema-on-write’)
Create reports. Analyze data

Beschreibe Why Data Lakes? – New big data thinking: All Data has Value

All data has potential value
Data hoarding § No defined schema—stored in native format
Schema is imposed and transformations are done at query time (schema-on-read).
Apps and users interpret the data as they see fit

Beschreibe Why Data Lakes? – Data Lake: Approach is always Bottoms-Up

Beschreibe den OldWay und den New Way

Comparision of storage approches

Metadata – File vs. Object

Unstructured Data & Object Storage

Metadata values are specific to each individual type.
Enables automated management of content.
Ensure integrity, retention and authenticity

Object Storage is good fit for

Unstructured data workloads
Capacity requirements beyond 100s of TBs
Distributed access to content
Data archiving: documents, emails, backus etc.
Storage for photos, videos, virtual machine image

SQL vs. Big Data

Nenne die Vor & Nachteile von SQL

SQL vs. NoSQL

CAP theorem (Consistency, Availability and Partition-tolerance)

Consistency - “Is the data I am looking at now the same if I look at it somewhere else”

Availability - “What will happen if my database goes down?”
Partitioning - “What if my data is on a different node?”

Nenne Vorteile und Nachteile Von NoSQL Datenbanken

Big Data Use Cases: 360-degree customer view

A 360-degree customer view is the attempt to get a complete view of customers by combining data from various touch points, such as marketing and the purchasing process. Businesses use a 360-degree customer view to drive better engagement, more revenue, and long-term loyalty

Big Data Use Case: Sentiment analysis

entiment analysis is the process of determining whether content demonstrates a positive, neutral, or negative feeling towards the subject of the content. It relies on analyzing information, such as social media feeds, customer reviews, emails, forum posts, customer feedback, and so on. It uses natural language processing and computational linguistics

Big Data Use Case: Fraud detection

Fraud detection is the process of identifying anomalies in patterns of behavior that signal potential fraud. Today, fraud detection can involve analyzing large volumes of data, such as

Transactions
Authorization information
Buying patterns

For example, it’s used by

Credit card companies to prevent unauthorized purchases that don’t match a customer’s profile
Financial service businesses to prevent illegal financial transactions.
Technology businesses to prevent unauthorized access to products and services, such as email.

Nenne die Domains of Data Science

Beitreten

Vorschau

Author

Dominik P.

Informationen

Zuletzt geändert
vor 2 Jahren

Kurs melden

Script 2

Author

Dominik P.

Informationen