Interpretable Counterfactual Explanations Guided by Prototypes

About

We propose a fast, model agnostic method for finding interpretable counterfactual explanations of classifier predictions by using class prototypes. We show that class prototypes, obtained using either an encoder or through class specific k-d trees, significantly speed up the the search for counterfactual instances and result in more interpretable explanations. We introduce two novel metrics to quantitatively evaluate local interpretability at the instance level. We use these metrics to illustrate the effectiveness of our method on an image and tabular dataset, respectively MNIST and Breast Cancer Wisconsin (Diagnostic). The method also eliminates the computational bottleneck that arises because of numerical gradient evaluation for $\textit{black box}$ models.

Arnaud Van Looveren, Janis Klaise• 2019

Related benchmarks

Task	Dataset	Result
Counterfactual Explanations	moons	Validity100	19
Counterfactual Explanations	HELOC	Validity100	19
Counterfactual Explanations	Law	Validity100	18
Counterfactual Explanation Generation	Blobs	Validity100	17
Counterfactual Explanation Generation	Digits	Validity100	17
Counterfactual Explanation Generation	Wine	Validity1	17
Counterfactual Explanation	140 datasets	Fidelity4.21	8
Counterfactual Explanations	Audit	Coverage1	6
Counterfactual Generation	FMCW radar dataset diagonal gestures	Interpretability Score2.1	6
Counterfactual Generation	FMCW radar dataset	Proximity1.8	6

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord