Big Data Analytics vs. Traditional Data Analytics
Big Data Analysis requires different tools and methodologies to deal with data Volume, Velocity and Variety.
However, it follows the same underlying process as that of traditional data analysis
Prepare: data retrieval, cleansing and transformation
Examine: data exploration and visualization
Model: data mining and knowledge discovery
Both have the common goal of highlighting useful information, suggesting conclusions and supporting decision making.
Beschreibe Top-Down + Bottoms-Up
Nenne Big Data Analytics Strategy
Why Data Lakes?
Start with end-user requirements to identify desired reports and analysis
Define corresponding database schema and queries
Identify the required data sources
Create a Extract-Transform-Load (ETL) pipeline to extract required data (curation) and transform it to target schema (‘schema-on-write’)
Create reports. Analyze data
Beschreibe Why Data Lakes? – New big data thinking: All Data has Value
All data has potential value
Data hoarding § No defined schema—stored in native format
Schema is imposed and transformations are done at query time (schema-on-read).
Apps and users interpret the data as they see fit
Beschreibe Why Data Lakes? – Data Lake: Approach is always Bottoms-Up
Beschreibe den OldWay und den New Way
Comparision of storage approches
Metadata – File vs. Object
Unstructured Data & Object Storage
Metadata values are specific to each individual type.
Enables automated management of content.
Ensure integrity, retention and authenticity
Object Storage is good fit for
Unstructured data workloads
Capacity requirements beyond 100s of TBs
Distributed access to content
Data archiving: documents, emails, backus etc.
Storage for photos, videos, virtual machine image
SQL vs. Big Data
Nenne die Vor & Nachteile von SQL
SQL vs. NoSQL
CAP theorem (Consistency, Availability and Partition-tolerance)
Consistency - “Is the data I am looking at now the same if I look at it somewhere else”
Availability - “What will happen if my database goes down?”
Partitioning - “What if my data is on a different node?”
Nenne Vorteile und Nachteile Von NoSQL Datenbanken
Big Data Use Cases: 360-degree customer view
A 360-degree customer view is the attempt to get a complete view of customers by combining data from various touch points, such as marketing and the purchasing process. Businesses use a 360-degree customer view to drive better engagement, more revenue, and long-term loyalty
Big Data Use Case: Sentiment analysis
entiment analysis is the process of determining whether content demonstrates a positive, neutral, or negative feeling towards the subject of the content. It relies on analyzing information, such as social media feeds, customer reviews, emails, forum posts, customer feedback, and so on. It uses natural language processing and computational linguistics
Big Data Use Case: Fraud detection
Fraud detection is the process of identifying anomalies in patterns of behavior that signal potential fraud. Today, fraud detection can involve analyzing large volumes of data, such as
Transactions
Authorization information
Buying patterns
For example, it’s used by
Credit card companies to prevent unauthorized purchases that don’t match a customer’s profile
Financial service businesses to prevent illegal financial transactions.
Technology businesses to prevent unauthorized access to products and services, such as email.
Nenne die Domains of Data Science
Zuletzt geändertvor einem Jahr