Axiomatic Attribution for Deep Networks
About
We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | MNIST (test) | Accuracy91.98 | 882 | |
| Image Classification | SVHN (test) | Accuracy64.31 | 362 | |
| Intent Classification | Banking77 (test) | Accuracy93.02 | 151 | |
| Explainability | ImageNet (val) | Insertion38.6 | 104 | |
| Interpretation Error Evaluation | ImageNet | Interpretation Error17.08 | 80 | |
| Text Classification | IMDB (test) | CA84.2 | 79 | |
| Localization | ImageNet-1k (val) | -- | 79 | |
| Feature Attribution Plausibility | MDACE (test) | P33.1 | 65 | |
| Feature Relevance Evaluation | ImageNet (test) | R (Feature Relevance)0.35 | 60 | |
| Interpretation | SST-2 | L2 Norm0.1133 | 56 |