Data Science Use Cases (DSUCs) Definition
Applications of data science to achieve business goals via prediction and insights from data.
DSUC identification criteria
Value – potential ROI, operational improvement, revenue growth.
Effort – time, resources, human capital needed.
Risk – market, technology, regulatory uncertainties.
DSUC Visualization
Bubble diagrams in project portfolio management show value (size), effort & risk (position).
common reasons for failure of DSUC and solutions
Lack of right data → secure, unbiased data, address ethics.
Unclear project goals → define problem, ask right questions, set final goal.
Wrong team composition → include data scientists, engineers, analysts, domain experts.
No clear methodology → combine DS project lifecycle with agile.
Stop after deployment → continuously improve, ensure security.
Value Proposition Questions
What is the value of the knowledge gained?
What will be learned about dataset & hypothesis?
How valuable are results for positive or negative predictions?
Types of DSUCs
customer-related
operational-related
fraud detection/security
customer-related DSUCs
Goals: loyalty, acquisition, reduced costs, churn prevention.
Questions:
Deal size drivers?
Customer journey?
Cost-effective acquisition?
Problematic features?
Example: Linking purchase history + social media for targeted marketing → higher conversion, lower acquisition costs.
operational-related DSUCs
Goals: cost reduction, service quality, asset optimization.
Predict maintenance/failures?
Impact of production changes?
Minimize processing time?
Example: Vivint uses IoT data to reduce false alarms, improve efficiency.
fraud detection/security use cases
Goals: detect unauthorized access, fraud patterns, high-risk customers.
Example: Detecting small fraudulent credit card transactions
Dataset Preparation & Prediction Model
Data collection: Internal/external DBs, sensors, web scraping, etc.
Preprocessing: Clean noise, remove redundant/missing values, select relevant features.
Training and testing datasets:
Training set: build and learn the model
testing set: evalutate the model’s accuracy
Prediction types:
Classification: categorized into classes (e.g., {fraud, not fraud}).
Regression (e.g., acquisition cost prediction).
E
making pedrictions and decisions
Evaluation:
Classification → threshold setting.
Regression → error margin (e.g., <5%).
Updating: Retrain model with new data or feature changes.
possible results of a classification prediction model when applied to data record
true positive (TP)
true negative (TN)
false positive (FP) —> type I classification error
false negative (FN) —> type II classification error
commonly used metrics for evaluating a model
accuracy
precision
recall
ration of the number of correct predictions to total predictions
how correct the model is when returning a positive result
how often the model produces true positives
used if we are more tolerant to false positives thatn false negatives
Receiver Operator Characteristic (ROC) Curve
trade-off between the true positive rate and the false positive rate at every possible cutoff value
ROC helps to find the best possible realistic treshold value which results in the highest TP rate and lowes FP rate
steps for generating ROC curve
binary classifier trained on a labeled dataset produces the probability scores for each instance —> lowest predicted probability score is used to set the initial value of the treshold
assignt the testing set according to classes and count the TP, TN, FP, and FN values
Calculate FP rate and TP rate
point on ROC curve with coordinate: FP rate/TP rate
increase threshold value to the next predicted probability score, repeat steps 2-4
formula ROC curve
regression model evaluation metrics
evaluate, how close the model output (y) is to the actual output (d)
absolute error
relative error
mean absolute percentage error
square error
mean square error
mean absolute error
root mean square error
absolute difference between the model’s output (d) and the desired output (y)
absolute error with respect to the desired output to obtain a unit-less percentage
mean absolute percentage error (MAPE)
average relative error calculated over the entire testing set of n data records
especially useful, when probability density distribution of the values is sufficiently far from zero, so that zero does not have a significant impact
ensures that a positive quantity is obtained
adds significant weight to large error values
mean square error (MSE)
average square error ocer the entire testing set for n data records
can be dominated by outliers
mean absolute error (MAE)
more robust than the mean squared error to outliers
root mean square error (RMSE)
square root of the mean square error
easier to interpret than mean square error
same scale of the desired outputs
Role of KPIs
After prediction model evaluation → implement to deliver DSUC value.
End users/decision-makers must check if DSUC value is achieved.
Assessment via performance measures → Key Performance Indicators (KPIs).
Distinction (Parmenter, 2020):
Result indicators – measure outcomes.
KPIs – focus on critical success factors, both current & future.
characteristics of effective KPIs (Parmenter, 2020)
Easy to understand by all staff.
Measured frequently (daily/weekly).
Assigned to relevant task manager (including CEO).
Show positive/negative deviations from business objectives.
Non-financial (not in $/€).
Significant impact on organizational performance.
Clear responsibility assigned (specific teams accountable).
Note: Non-financial criterion (5) often ignored in practice.
examples of performance measures
Planned innovations.
Late customer deliveries.
Number of late projects.
Downtime due to breakdowns or staff absence.
Unresolved customer complaints.
Number of vacant positions.
New initiatives reported weekly.
Employees in critical roles resigning.
Examples of Performance Measures
Measures % of employees leaving the company.
Formula:
Can be calculated for specific teams/time periods.
Helps estimate satisfaction across organization.
Cognitive Biases & Decision-Making Fallacies
Both experts & non-experts are vulnerable to biases (Montibeller & Winterfeldt, 2015).
Bias = prejudice influencing unfair judgment/decisions.
Types:
Cognitive bias → systematic deviation due to culture/experience.
Motivational bias → influenced by desire for a preferred outcome.
In data science: biases can distort datasets, processing & prediction models → inaccurate decisions.
Awareness & de-biasing techniques are essential.
Categories of Bias (Baer, 2019)
action-oriented biases
stability biases
pattern-recognition biases
interest and socail biases
Favor action > inaction.
Examples: overoptimism, overconfidence, bizarreness effect.
Overconfidence → excessive belief in one’s judgment → ignores alternatives.
Mitigation: humility, diverse perspectives, feedback, objective assessments.
Preference for status quo / resistance to change.
Examples: status quo bias, loss aversion, anchoring effect.
Anchoring effect → first info strongly influences estimates (e.g., population guesses, negotiations).
Mitigation: avoid anchors or use different expert anchors.
Brain creates faulty rules/patterns.
Example: Confirmation bias → seek info confirming existing beliefs.
Mitigation: consult diverse experts, evaluate alternative hypotheses, apply probability assessments.
interest & social biases
Rooted in personal interests, desires, preferences.
Influence analysis, data interpretation, emphasis.
Social bias: conforming to group/social expectations.
de-biasing techniques
Probability training → improves understanding of statistics.
Counterfactuals → rebalance data (e.g., replace “He is a doctor” → “She is a doctor”).
Caution: removing one bias may introduce costs or new issues.
Best strategy → prevent bias at dataset level + ensure data diversity.
Zuletzt geändertvor einem Monat