XAI
Tradeoff between explainability and accuracy
CL has high accuracy but low explainability
Interpretability: Link between features and prediction can be understood by a human
Produce explanaitions without sacrificing accuracy
FAST: Fairness, Accountability, Sustainability, Transparency
Black boxes vs. white boxes
Output: Visualizations, summary statements, kexword, feature importance
Partial dependency Plot (PDP)
Shows effect of 1 or 2 features on outcome
Show relation between target and feature
Plot dfeature on x-axis and prediction on y-axis
simple, easy implementation
does not account correlated features, more than 3 features, no distr., effect might canceld out
Individual Conditional Expectation (ICE)
Shows the instances prediction changes when feature changes
One feature at a time
PDP is the average of ICE over all features
Local Interpretable Model-agnostic Explanaitions (LIME)
Input: Sample you want to get an explanaition for
Step 1: Create pertubation on the input image
Step 2: Make predictions with the black box model
Step 3: Train a simples surrogate model to explain
Explaining by finding prototype class memebers
Global explanation technique
Find pattern to maximize class activation
Not necessarily a real image
Syynthesizing preferred inputs for NN via DGN
Step 1: Forward pass of gradients
Step 2: Detect maximum activation
Step 3: Backward pass into encoder
Gradient-based methods
Magnitude of gradients refelects importance of input to output scores
Deonvnet: inverts direction of applying activation; zero outs negative activations
Guided Backpropagation: Deconvnet + backpropagatoon only positive gradients a nd setting negative to zero
GradCAM
Generate heatmap with regions of interest for output
Use gradients flowing into the last layer —> determine importance of each neuron
Propagate gradients back to the input —> calc weights for each pixel
Interpretable Explanations of Black Boxes by pertubation
Motivation: Gradient approaches not specific enough
Idea: Select right pertubation to study effect on f(x)
Explanaition based on changes applied to input x
Layer wise relevance propagation (LRP)
Gradient methods suffer from shattered gradient
Sensitivity explains changes to the prediction function not the functions itself
Sanity Checks for Saliency Maps
Objective evaluation of interpretable methods
Observation Some models independent of model and data preprocessing
Model parameter randomization test: pertubation to labels
Check saliency maps for increasing randomization —-> if it always stay the same —> fail?
Only Gradients and GradCAM passed
TEsting with concept activation vectors (TCAV)
How much concept was important for a prediction in trained model
Step 1: defining concept (e.g. stripes)
Step 2: Find pictures representing the concept and random ones
Step 3: Train linear classifier to separate them
Step 4: CAV is the orthogonal vector to the decision boundary
Step 5: Derivative of the po^robability of being a zebra with resprect to the vector —> how much will the porbability of being zebra change?
Fraction of k-class inputs whose l-layer activation vector was positively influenced by concept C
Interpretability-driven sample selection for active learning (IDEAL)
Data mine saliency maps; select most informative samples
Improved learning rates via interpretability guidance
Can XAI used an inductive bias in DL
Inductive bias: opriors on desired properties of the model
SIBnet saliency inductive bias net
Zuletzt geändertvor einem Jahr