A Framework to Learn with Interpretation

About

To tackle interpretability in deep learning, we present a novel framework to jointly learn a predictive model and its associated interpretation model. The interpreter provides both local and global interpretability about the predictive model in terms of human-understandable high level attribute functions, with minimal loss of accuracy. This is achieved by a dedicated architecture and well chosen regularization penalties. We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers and whose outputs feed a linear classifier. We impose strong conciseness on the activation of attributes with an entropy-based criterion while enforcing fidelity to both inputs and outputs of the predictive model. A detailed pipeline to visualize the learnt features is also developed. Moreover, besides generating interpretable models by design, our approach can be specialized to provide post-hoc interpretations for a pre-trained neural network. We validate our approach against several state-of-the-art methods on multiple datasets and show its efficacy on both kinds of tasks.

Jayneel Parekh, Pavlo Mozharovskyi, Florence d'Alch\'e-Buc• 2020

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-10 (test)	Accuracy79.6	3381
Image Classification	MNIST (test)	Accuracy99.4	894
Image Classification	SVHN (test)	Accuracy90.8	470
Image Classification	F-MNIST (test)	Accuracy91.5	156
Environmental Sound Classification	ESC-50 (test)	Top-1 Fidelity73.5	14
Image Classification	QuickDraw (test)	Accuracy82.6	10
multi-label urban sound tagging	SONYC-UST	Macro AUPRC81.6	4

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord