Explainable AI

von Julia A.

XAI

Tradeoff between explainability and accuracy
CL has high accuracy but low explainability
Interpretability: Link between features and prediction can be understood by a human
Produce explanaitions without sacrificing accuracy
FAST: Fairness, Accountability, Sustainability, Transparency
Black boxes vs. white boxes
Output: Visualizations, summary statements, kexword, feature importance

Partial dependency Plot (PDP)

Shows effect of 1 or 2 features on outcome
Show relation between target and feature
Plot dfeature on x-axis and prediction on y-axis
simple, easy implementation
does not account correlated features, more than 3 features, no distr., effect might canceld out

Individual Conditional Expectation (ICE)

Local Interpretable Model-agnostic Explanaitions (LIME)

Explaining by finding prototype class memebers

Syynthesizing preferred inputs for NN via DGN

Gradient-based methods

Magnitude of gradients refelects importance of input to output scores
Deonvnet: inverts direction of applying activation; zero outs negative activations
Guided Backpropagation: Deconvnet + backpropagatoon only positive gradients a nd setting negative to zero

GradCAM

Generate heatmap with regions of interest for output
Use gradients flowing into the last layer —> determine importance of each neuron
Propagate gradients back to the input —> calc weights for each pixel

Interpretable Explanations of Black Boxes by pertubation

Layer wise relevance propagation (LRP)

Gradient methods suffer from shattered gradient
Sensitivity explains changes to the prediction function not the functions itself

Sanity Checks for Saliency Maps

Objective evaluation of interpretable methods
Observation Some models independent of model and data preprocessing
Model parameter randomization test: pertubation to labels
Check saliency maps for increasing randomization —-> if it always stay the same —> fail?
Only Gradients and GradCAM passed

TEsting with concept activation vectors (TCAV)

How much concept was important for a prediction in trained model
Step 1: defining concept (e.g. stripes)
Step 2: Find pictures representing the concept and random ones
Step 3: Train linear classifier to separate them
Step 4: CAV is the orthogonal vector to the decision boundary
Step 5: Derivative of the po^robability of being a zebra with resprect to the vector —> how much will the porbability of being zebra change?
Fraction of k-class inputs whose l-layer activation vector was positively influenced by concept C

Interpretability-driven sample selection for active learning (IDEAL)

Zuletzt geändert
vor 2 Jahren