Not Just a Black Box: Learning Important Features Through Propagating Activation Differences
About
Note: This paper describes an older version of DeepLIFT. See https://arxiv.org/abs/1704.02685 for the newer version. Original abstract follows: The purported "black box" nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Learning Important FeaTures), an efficient and effective method for computing importance scores in a neural network. DeepLIFT compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference. We apply DeepLIFT to models trained on natural images and genomic data, and show significant advantages over gradient-based methods.
Avanti Shrikumar, Peyton Greenside, Anna Shcherbina, Anshul Kundaje• 2016
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Explainability | ImageNet (val) | Insertion36.3 | 104 | |
| Attribution Fidelity | ImageNet 1,000 images (val) | µFidelity0.157 | 48 | |
| Deletion | ImageNet 2,000 images (val) | Deletion Score0.14 | 48 | |
| Feature Attribution | Rotten Tomatoes fine-tuned | LO-0.121 | 18 | |
| Feature Attribution | IMDB (test) | LO-0.0892 | 18 | |
| Feature Attribution | SST2 | LO-0.199 | 18 | |
| Explanation Faithfulness | IMDB Review 1,000 sentences (val) | Word Deletion Score68.2 | 14 | |
| Feature Attribution | Synthetic half-moons dataset with Gaussian noises (std dev 0.05-0.65) | AUC-Purity0.328 | 10 | |
| Feature Attribution | Pascal VOC (test) | AUC-Comp0.21 | 8 |
Showing 9 of 9 rows