Have We Learned to Explain?: How Interpretability Methods Can Learn to Encode Predictions in their Interpretations

About

While the need for interpretable machine learning has been established, many common approaches are slow, lack fidelity, or hard to evaluate. Amortized explanation methods reduce the cost of providing interpretations by learning a global selector model that returns feature importances for a single instance of data. The selector model is trained to optimize the fidelity of the interpretations, as evaluated by a predictor model for the target. Popular methods learn the selector and predictor model in concert, which we show allows predictions to be encoded within interpretations. We introduce EVAL-X as a method to quantitatively evaluate interpretations and REAL-X as an amortized explanation method, which learn a predictor model that approximates the true data generating distribution given any subset of the input. We show EVAL-X can detect when predictions are encoded in interpretations and show the advantages of REAL-X through quantitative and radiologist evaluation.

Neil Jethani, Mukund Sudarshan, Yindalon Aphinyanaphongs, Rajesh Ranganath• 2021

Related benchmarks

Task	Dataset	Result
Classification	Lung	ACC93.27	96
Classification	GLI_85	Accuracy83.24	88
Classification	Adult	Accuracy36.48	86
Classification	TOX_171	Accuracy90.79	78
Classification	Colon	Accuracy76.75	78
Classification	ALLAML	Accuracy84.16	72
Classification	SMK_CAN_187	Accuracy56.48	72
Classification	ARCENE	Accuracy77.3	70
Classification	HDLSS Datasets Summary	Average Rank12.75	66
Classification	Prostate_GE	Accuracy86.75	64

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord