2. UC & Performance Eval

Buffl

Data Science

von Marie R.

Data Science Use Cases (DSUCs) Definition

Applications of data science to achieve business goals via prediction and insights from data.

DSUC identification criteria

Value – potential ROI, operational improvement, revenue growth.
Effort – time, resources, human capital needed.
Risk – market, technology, regulatory uncertainties.

DSUC Visualization

Bubble diagrams in project portfolio management show value (size), effort & risk (position).

common reasons for failure of DSUC and solutions

Lack of right data → secure, unbiased data, address ethics.
Unclear project goals → define problem, ask right questions, set final goal.
Wrong team composition → include data scientists, engineers, analysts, domain experts.
No clear methodology → combine DS project lifecycle with agile.
Stop after deployment → continuously improve, ensure security.

Value Proposition Questions

What is the value of the knowledge gained?
What will be learned about dataset & hypothesis?
How valuable are results for positive or negative predictions?

Types of DSUCs

customer-related
operational-related
fraud detection/security

customer-related DSUCs

Goals: loyalty, acquisition, reduced costs, churn prevention.
Questions:
- Deal size drivers?
- Customer journey?
- Cost-effective acquisition?
- Problematic features?
Example: Linking purchase history + social media for targeted marketing → higher conversion, lower acquisition costs.

operational-related DSUCs

Goals: cost reduction, service quality, asset optimization.
Questions:
- Predict maintenance/failures?
- Impact of production changes?
- Minimize processing time?
Example: Vivint uses IoT data to reduce false alarms, improve efficiency.

fraud detection/security use cases

Goals: detect unauthorized access, fraud patterns, high-risk customers.
Example: Detecting small fraudulent credit card transactions

Dataset Preparation & Prediction Model

Data collection: Internal/external DBs, sensors, web scraping, etc.
Preprocessing: Clean noise, remove redundant/missing values, select relevant features.
Training and testing datasets:
- Training set: build and learn the model
- testing set: evalutate the model’s accuracy
Prediction Model:
- Classification: categorized into classes (e.g., {fraud, not fraud}).
- Regression (e.g., acquisition cost prediction).

making predrictions and decisions

Evaluation:
- Classification → threshold setting.
- Regression → error margin (e.g., <5%).
Updating: Retrain model with new data or feature changes.

possible results of a classification prediction model when applied to data record

true positive (TP)
true negative (TN)
false positive (FP) —> type I classification error
false negative (FN) —> type II classification error

commonly used metrics for evaluating a model

accuracy
precision
recall

accuracy

ration of the number of correct predictions to total predictions

precision

how correct the model is when returning a positive result

recall

how often the model produces true positives
used if we are more tolerant to false positives thatn false negatives

Receiver Operator Characteristic (ROC) Curve

trade-off between the true positive rate and the false positive rate at every possible cutoff value
ROC helps to find the best possible realistic treshold value which results in the highest TP rate and lowes FP rate

steps for generating ROC curve

binary classifier trained on a labeled dataset produces the probability scores for each instance —> lowest predicted probability score is used to set the initial value of the treshold
assignt the testing set according to classes and count the TP, TN, FP, and FN values
Calculate FP rate and TP rate
point on ROC curve with coordinate: FP rate/TP rate
increase threshold value to the next predicted probability score, repeat steps 2-4

regression model evaluation metrics

evaluate, how close the model output (y) is to the actual output (d)
absolute error
relative error
mean absolute percentage error
square error
mean square error
mean absolute error
root mean square error

absolute error

absolute difference between the model’s output (d) and the desired output (y)

relative error

absolute error with respect to the desired output to obtain a unit-less percentage

mean absolute percentage error (MAPE)

average relative error calculated over the entire testing set of n data records
especially useful, when probability density distribution of the values is sufficiently far from zero, so that zero does not have a significant impact

square error

ensures that a positive quantity is obtained
adds significant weight to large error values

mean square error (MSE)

average square error over the entire testing set for n data records
can be dominated by outliers

mean absolute error (MAE)

more robust than the mean squared error to outliers

root mean square error (RMSE)

square root of the mean square error
easier to interpret than mean square error
same scale of the desired outputs

Role of KPIs

After prediction model evaluation → implement to deliver DSUC value.
End users/decision-makers must check if DSUC value is achieved.
Assessment via performance measures → Key Performance Indicators (KPIs).
Distinction (Parmenter, 2020):
- Result indicators – measure outcomes.
- KPIs – focus on critical success factors, both current & future.

characteristics of effective KPIs (Parmenter, 2020)

Easy to understand by all staff.
Measured frequently (daily/weekly).
Assigned to relevant task manager (including CEO).
Show positive/negative deviations from business objectives.
Non-financial (not in $/€).
Significant impact on organizational performance.
Clear responsibility assigned (specific teams accountable).

Note: Non-financial criterion (5) often ignored in practice.

examples of performance measures

Planned innovations.
Late customer deliveries.
Number of late projects.
Downtime due to breakdowns or staff absence.
Unresolved customer complaints.
Number of vacant positions.
New initiatives reported weekly.
Employees in critical roles resigning.

Examples of Performance Measures

Measures % of employees leaving the company.
Formula:

Can be calculated for specific teams/time periods.
Helps estimate satisfaction across organization.

Bias definition

Bias = prejudice influencing unfair judgment/decisions.
Types:
- Cognitive bias → systematic deviation due to culture/experience.
- Motivational bias → influenced by desire for a preferred outcome.
In data science: biases can distort datasets, processing & prediction models → inaccurate decisions.
Awareness & de-biasing techniques are essential.

Categories of Bias (Baer, 2019)

action-oriented biases
stability biases
pattern-recognition biases
interest and socail biases

action-oriented biases

Favor action > inaction.
Examples: overoptimism, overconfidence, bizarreness effect.
Overconfidence → excessive belief in one’s judgment → ignores alternatives.
Mitigation: humility, diverse perspectives, feedback, objective assessments.

stability biases

Preference for status quo / resistance to change.
Examples: status quo bias, loss aversion, anchoring effect.
Anchoring effect → first info strongly influences estimates (e.g., population guesses, negotiations).
Mitigation: avoid anchors or use different expert anchors.

pattern-recognition biases

Brain creates faulty rules/patterns.
Example: Confirmation bias → seek info confirming existing beliefs.
Mitigation: consult diverse experts, evaluate alternative hypotheses, apply probability assessments.

interest & social biases

Rooted in personal interests, desires, preferences.
Influence analysis, data interpretation, emphasis.
Social bias: conforming to group/social expectations.

de-biasing techniques

Probability training → improves understanding of statistics.
Counterfactuals → rebalance data (e.g., replace “He is a doctor” → “She is a doctor”).
Caution: removing one bias may introduce costs or new issues.
Best strategy → prevent bias at dataset level + ensure data diversity.

Beitreten

Vorschau

Author

Marie R.

Informationen

Zuletzt geändert
vor 8 Stunden

Kurs melden