undefined

by Jacob H.

Defintion Business Intelligence

Processes, technologies and tools that are need to turn data into information, information into knowledge and knowledge into wisdom or plans that drive profitable business action.

OLAP?

On-line Analytical Processing

Describe an analytics-based organization

Analytics as necessity for success and survival
Data-driven business decision rather than intuition
Entire business drivne by analytics

3 types of analytics

Descriptive Analytics: Display data that is relevant
Predictive Analytics: Discover insights in data
Prescriptive Analytics: Compute best decisions and present them as recommendations

Types of Information Integration

Distributed DB: Distributed queries across multiple databases -> Slow execution!
Centralized DB: 1 single centralized DB -> Slow in operative Use

=> Need enterprise wide data model / architecture

Challanges of Information Integration?

Technical heterogeneity: different systems, communication systems, archticture
Structural and syntactic heterogeneity: different encoding, units, synonyms
Semantic heterogeneity: different interpretation of data

Datasilos?

Data Silos contain data that is only available for a specific part of the organization

Approaches to address challenges of Information Integration?

Federation: Point-to-Point Integration
-> n application/ data sources -> n^2 connections
Warehouse: Sources are translated from their local schema to a global scheme and copied to a central DB
-> unidirectional flows
Mediator (Virtual Warehouse): Turns user query into a sequence of source queries and assembles the results of these queries into an aggregated result
-> Complex architecture

Definition Data Warehouse

A storage architecture designed to hold data extracted from transaction systems, operational data stores and external sources. The ware house then combines that data in an aggregated, summery form suitable for enterprise wide data analysis and reporting for predefined business needs.

Characteristics of Data Warehouses?

Subject-Orientation: Data is not stored by operational applications but by business subjects -> data is grouped around subjects
-> Simple data querying
Integration: Contains consolidated data from several application/ DB
- Stored Data that is not updated or deleted
- Data inconsistencies are removed -> Need for standardizatiojn
Non-volatile and time-variant:
- No overwriting of “old” data
- Contains historical data to
  - allow analysis of the past,
  - relate information to the presen
  - and enable forecasts for the future

OLTP vs. OLAP

OLTP = Online Transaction Processing:

Operational Systems, which support the execution of business processes

OLAP = Online Analytical Processing:

Analzying the data describing business transactions/ processes
Objective: Turn raw transactional data -> strategic/ tactical/ operative information

Requirements for OLTP?

Optimized for small and short transactions
Access to up-to-date, consistend DBs
Avoidance of redundancies -> normalized schemas

Requirements for OLAP?

Queries with large result stes
No immediate updates or inserts, but large periodic batch insets
(Controlled) redundanca is a neccessity for performance reasons
Goal: Response Time of seconds/ a few minutes

Why does the Water and Oil analogy fit for OLTP vs. OLAP?

Lock Conflicts: long-running OLAP reads may block OLTP writes
Feshness of data
- OLTP: up-to-date data
- OLAP: reproducibility of analysis
Precision:
- OLTP: exct
- OLAP: smaplind, statistical summaries, confidence intervals

DWH Reference Architecture

Source DB in DWH Reference Architecture?

DB of an application which supports one or more types of business transactions

Landing Area in the DWH Reference Architecture?

DB that is able to store a single data extract of a subset of one Source DB

-> Data of Source DB is typically accessible via APIs and follow a pre-defined schema of a subset of data

Staging Area in the DWH Reference Architecture?

DB that is able to store matching data extracts from various Landing Areas in an integrated form

-> Upload to DW from Staging Area happens once data from all Landing Areas are available

-> Scheme basically corresponds 1:1 with the DW schema

Data Warehouse in the DWH Reference Architecture?

DB containing the history of all complete Staging Areas

-> Its integrated schema is frequently more or less in 3NF

Data Mart in DWH Reference Architecture?

Objective: DB containing data describing the (present or past) performance of one or more types of business transactions

Master Data in DWH Reference Architecture?

“Factured out“ information required by many applications to establish the context of data
-> Typically ends up in the dimensional tables of star schema
“Structured“ master data: customers, vendors, products, categorization, attributes/ codes

Categorization of Master Data?

Owned externally
Owned internally

Meta Data in DWH Reference Architecture?

Data about Data in DWH

-> Necessary for

using,
building
and administering a DWH

Types of DWH

Operational metadata
Extraction and transformation metadata
End-user metadata

Name Use Cases of DWH

Retail: Customer Loyalty, Market Planning
Manufacturing: Cost Reduction, Logistics Management
Utilities: Asset Management, Resource Mangement
Airlines: Route profitability, Yield Management

Name typical Architectural Options of DWH Refenrence Architecture!

Single stand-alone data mart: May be use if there is no need for integration
- Separation of OLAP solution from a single integrated OLTP system
- Local Reporting / OLAP solution for (non-integrated) OLTP system
Several independent stand-alone data marts: “quick and dirty solution“ (-> soon to be replaced)
A Conformed constellation of data marts (sharing common dimensions)
-> DW Bus Architecture
- No “explicit“ DW -> no common schema
- Conformed facts and dimensions
- Mapping through Master Data stores
A single Integrated Data Warehouse feeding all data marts
- Master Data can be seen as just another Source DB
- Typically 1 landing area for each Source DB + integrated Staging Area
Multiple Data Warehouses for different functions (problematic but not uncommon)

ETL?

Extraction, Transformation, Loading

Why have been Virtual DWHs not yet widely adopted?

Heterogeneity of legacy systems
Strain on operative systems from management queries
Lack of historization in operative systems
Requirement of complex transformation rules

General Process of ETL?

Extract information from OLTP sources
Transform it
Load into the DWH

Why are “intermediate results“ stored in the Landing Area and so on?

Checking of Data Quality
Serves as backup source if a process failes

How can the Extraction component or phase be monitored? What strategies are there for the timing of the Extraction?

Triger-based: change in source DB -> Extraction
Replication based
Log-based: analysis of logs to detect changes
Timestamp-based:
Snapshot-based: Periodicall copy complete data sets into a file and compare snapshots to detect changes

What is done in the Transformation phase?

Extracted data is prepared for the loading into the DW

Structural preparation: schema integration
Content preparation: Data Cleaning and Integration

What steps are involved in the Transformation Phase?

Transformation of extracted data into a consistent format (-> data type, units, encodings, etc.)
Data Cleaning: Handling of missing or redundant or inconstent or outdated values
Data Scrubbing: Usage of domain knowledge / business rules to detect “dirty” data
Data Auditing: Assessment of the data quality and usage of data mining techniques to detect rules and deviations

4 Transformation Layers?

Filtering
- automatice detection, automatioc correction
- automatic - manual
- manual - automatic
Harmonization
- Syntactic harmonization -> Consistent format
- Business harmonization -> Business indicators
Aggregation
-> Using dimensional hierarchy tables and parallel hierarchies
Enrichment: Calcultion of business performance indicators and storage as attributes

What are implementation decisions for ETL?

Buying ETL tool that offers various services
Building your own solution based on file technology and/ or DBMS technology

Goals of Designing Analytic Databases/ Data Marts

In general Data Marts support analysis of 1 or more business processes/ transactions by e.g.

running queries that need to look at a substantial amout of data

=> Maximizing query read performance through caching and Denormalization

What is the typical design process of data marts?

Seperate analytical databases from operational databases
Determine how to measure the execution of a given business process (-> KPIs) and look at the context (-> dimensions) of this business process
Design corresponding Star or Snowflake Schema
Tune using indexes other physical design options

What ingredients for data marts can be extracted from the analysis of business processes?

Measure
Context (category)
Time
+ aggregation & selection

Components of Star Schema?

Dimension Tables: Provide “business context” and are used for categorization
- Not normalized
Time Dimension: Special dimension (table) time
Fact Tables: has a grain defining most atomic unit of data being captured (e.g. transaction level: measures in certain context + daily / weekyl / monthly / quarterly summaries)
- Attributes:
  - measures (basic or derived)
  - surrogates -> foreign key to a dimension table
  - Primary Key

Facts vs. Dimensions

Facts	Dimensions
Usually verbs	Usually nouns
Business Performance indicators	Represent criteria
Usually numeric and continious	Usually symbolic and discrete
Can be aggregated	Serve the selection, aggregation and navigation of facts
Data Volume: high	Data Volume: low

Star vs. Snowflake Schema

Star Schema: Database schema where a single central table (fact table) is connected to one or more denormalized dimension tables.
Snowflake Schema: Variation of Star Schema, where dimension tables are normalized. This means that the dimension tables are broken down into multiple related tables, rather than having all of the attributes in a single table

Pros and Cons of Snowflake Schema?

Pros:

Small saving in storage space
Normalized structures are easier to update and maintain

Cons:

Schema becomes less intuitive -> complexity
Browsing through contents becomes more difficult
Decreased query performance due to additional joins

Name the 5 desired properties of OLAP Systems

F ast:

A nalysis:

S hared:

M ultidimensional

I nformation: Scalability

-> FASMI

Operations in multidimensional data space?

Slicing: filter conditions in 1 or more dimensions

-> E.g. Filter after a certain year

Dicing: Adding / Removing / Exchaning dimensions

Drill-Down: Going from a broader level of aggregation to a finer more detailed level

-> add a column to group by clause

Roll-Up: Going from a finer level of level of aggregation to a broader level

-> remove column from group by clause

Ranking: e.g. top performers, top risks, etc.

Name 3 physical Implementations of OLAP

R-OLAP: Relational OLAP
M-OLAP: Multi-Dimensional OLAP
H-OLAP: Hybrid OLAP

R-OLAP vs. M-OLAP

R-OLAP	M-OLAP
Star- and Snowflake-Schemas in classic, relational DBMSs	Proprietary DBMSs optimized for storage of multidimensional data
+ Compatiblity (can be accessed by any SQL tool) + Flexibility (not limited to cube structure) + Better scalability for large data volumes and high number of dimensions + Shorter load times + Costs may be lower	+ Faster response time and better query performance + Smaller storage footprint + More powerful as specialized technology for the specific purpose + More suitable for complex calculations that do not translate well into SQL

R-OLAP

M-OLAP

Star- and Snowflake-Schemas in classic, relational DBMSs

Proprietary DBMSs optimized for storage of multidimensional data

+ Compatiblity (can be accessed by any SQL tool)

+ Flexibility (not limited to cube structure)

+ Better scalability for large data volumes and high number of dimensions

+ Shorter load times

+ Costs may be lower

+ Faster response time and better query performance

+ Smaller storage footprint

+ More powerful as specialized technology for the specific purpose

+ More suitable for complex calculations that do not translate well into SQL

H-OLAP?

Combination or R-OLAP and M-OLAP

Vertical Partinioning:
- M-OLAP for aggregation (Roll-ups)
- R-OLAP for Drill-Down
Horizontal Partioning:
- Certain of data in M-OLAP
- Older or less frequently used data in R-OLAP

Name 3 R-OLAP Optimizations

Horizontal Partitioning: Split table into n disjoint parts with the same schema -> Union of all parts equals original table
- By range
- By hash (application of hash-function)
- Round Robin: Archive oldest partition and reuse it for the latest data
- By reference
Optimization for scan/ selection operations: Bitmap Indexes
Optimization for memory hierarchy: Vertical Partinioning/ Column Stores: Divide a table into multiple tables that contain fewer columns
- Unique Key colume is replicated for joining

Decomposed Storage Model (DSM)?

Extreme case of vertical partitioning: Division of each table into a set of 2-column tables (RID, attribute)

Column Stores?

Specialized stores that use DSM to layout data on a disk

DWH Waterfall Development Process?

Specification: Assessment of current state and requirements
Platform, Software and Tool Selection: Analysis and evaluation of potential alternatives
Realization
1. Conceptualization
2. Implementation
3. Execution
Operation

What are typical criterias for DWH DB selection?

Scalabiltiy: -> Data Volume, Complexity of Queries
OLAP capabilities: -> Cupe operation
Optimization for analytical queries
Cloud or on-premise?

Top-down approach of Inmon vs. Bottom-up approach of Kimball

Inmon	Kimbal
Top-down approach	Bottom-up approach
Build centralized DW first and then departmental Data Marts	1 Data Mart for each major business process rather then building centralized DW first -> Enterprise wide cohesion through data bus
Deductive	Inductive
Generic -> Specific	Specific -> Generic

3 (dis-)advantage of Inmons Top-down Approach

Advantages	Disadvantages
Enterprise wide view of data	Longer Building Time
Inherently architectonic	High Risk of failure -> high effort without proof of concept
Single, central storage of data	Requirement of high levels of cross-functional skills
Centralized rules and controls
Quick results if implemented in iterations

3 (dis-)advantages of Kimballs Bottom-up Approach

Advantages	Disadvantages
Faster and easier implementation	No enterprise wide view as each Data Mart has its own narrow view of data
Favorable ROI and Proof of Concept	Redundant data in every Data Mart
Lower risk of failure	Inconsistent and irreconcilable data
Inherenty incremental	Proliferates unamangeable interfaces
Learning by doing

Name 3 fundamental concepts of Kimballs DW Lifecycle Process

Focus on business
Dimensionally structure the data that is delivered to the business
Iteratively develop overall data warehouse environment in manageable lifecycle increments

Kimballs DW Lifecycle Process

Project vs. Program in Kimballs DW Lifecycle Process?

Project = Single iteration of the lifecycle

Program = Ongoing coordination of resources, infrastructure, …

What is done in Technology Track of Kimballs DW Lifecylce Process?

Technical Architecture Design
Product Selection and Installation

What is done in Data Track of Kimballs DW Lifecylce Process?

Dimensional Modeling (-> Enterprise data warehous bus matrix)
Physical Design (Design and set up of database environment)
Data Integration (ETL)

What is done in Applications Track of Kimballs DW Lifecylce Process?

Reporting and analytics design
Reporting and analytics development

Process of Agile BI Approaches

Identify Knowledge Workers
Conduct Interviews
Identify Source Data
Design Data Repositories (Enterprise DW + Departmental Data Marts)
Design ETL

Motivation behind Agile BI Approaches?

Evolving user and business requirements and new business rules

Problems and Risks of Agile BI Approaches?

An architectual approach is still necessary
Management across interdependent BI projects is critical
Conflict with top-down, slow-moving data governance
Not necessarily cheaper

Recent Developments in Data Warehousing?

HTAP (Hybrid Transaction/ Analytical Processing) Systems: more informed and real time decision making
-> Combination of OLAP and OLTP system
Cloud DWH/ BI
Mobile BI

Join Course

Preview

Author

Jacob H.

Information

Last changed
3 years ago

Report course