Not Just a Black Box: Learning Important Features Through Propagating Activation Differences

About

Note: This paper describes an older version of DeepLIFT. See https://arxiv.org/abs/1704.02685 for the newer version. Original abstract follows: The purported "black box" nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Learning Important FeaTures), an efficient and effective method for computing importance scores in a neural network. DeepLIFT compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference. We apply DeepLIFT to models trained on natural images and genomic data, and show significant advantages over gradient-based methods.

Avanti Shrikumar, Peyton Greenside, Anna Shcherbina, Anshul Kundaje• 2016

Related benchmarks

Task	Dataset	Result
Explainability	ImageNet (val)	Insertion36.3	104
Attribution Fidelity	ImageNet 1,000 images (val)	µFidelity0.157	48
Deletion	ImageNet 2,000 images (val)	Deletion Score0.14	48
Saliency map evaluation	ImageNet-S50 (test)	Pointing Game78.1	34
Audio Classification Attribution	VGG-Sound (val)	Deletion AUC3.26	28
Feature Attribution	Rotten Tomatoes fine-tuned	LO-0.121	18
Feature Attribution	IMDB (test)	LO-0.0892	18
Feature Attribution	SST2	LO-0.199	18
Explanation Faithfulness	IMDB Review 1,000 sentences (val)	Word Deletion Score68.2	14
Feature Attribution	Synthetic half-moons dataset with Gaussian noises (std dev 0.05-0.65)	AUC-Purity0.328	10

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord