Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A Framework to Learn with Interpretation

About

To tackle interpretability in deep learning, we present a novel framework to jointly learn a predictive model and its associated interpretation model. The interpreter provides both local and global interpretability about the predictive model in terms of human-understandable high level attribute functions, with minimal loss of accuracy. This is achieved by a dedicated architecture and well chosen regularization penalties. We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers and whose outputs feed a linear classifier. We impose strong conciseness on the activation of attributes with an entropy-based criterion while enforcing fidelity to both inputs and outputs of the predictive model. A detailed pipeline to visualize the learnt features is also developed. Moreover, besides generating interpretable models by design, our approach can be specialized to provide post-hoc interpretations for a pre-trained neural network. We validate our approach against several state-of-the-art methods on multiple datasets and show its efficacy on both kinds of tasks.

Jayneel Parekh, Pavlo Mozharovskyi, Florence d'Alch\'e-Buc• 2020

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-10 (test)
Accuracy79.6
3381
Image ClassificationMNIST (test)
Accuracy99.4
882
Image ClassificationSVHN (test)
Accuracy90.8
362
Image ClassificationF-MNIST (test)
Accuracy91.5
64
Environmental Sound ClassificationESC-50 (test)
Top-1 Fidelity73.5
14
Image ClassificationQuickDraw (test)
Accuracy82.6
5
multi-label urban sound taggingSONYC-UST
Macro AUPRC81.6
4
Showing 7 of 7 rows

Other info

Follow for update