Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps
About
Transformer-based language models significantly advanced the state-of-the-art in many linguistic tasks. As this revolution continues, the ability to explain model predictions has become a major area of interest for the NLP community. In this work, we present Gradient Self-Attention Maps (Grad-SAM) - a novel gradient-based method that analyzes self-attention units and identifies the input elements that explain the model's prediction the best. Extensive evaluations on various benchmarks show that Grad-SAM obtains significant improvements over state-of-the-art alternatives.
Oren Barkan, Edan Hauon, Avi Caciularu, Ori Katz, Itzik Malkiel, Omri Armstrong, Noam Koenigstein• 2022
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Negative temporal attribution | FordA | Δŷc (2%)-0.1 | 14 | |
| Hate Speech Detection | HateXplain (held-out) | F1 Score39.6 | 14 | |
| Grammatical Acceptability | CoLA (held-out) | F1 Score35.6 | 14 | |
| Time Series Attribution | SeqComb-UV synthetic (test) | AUPRC61 | 14 | |
| Temporal Attribution | FORD-A | I(100)74.17 | 14 | |
| Sentiment Analysis | SST-2 (held-out) | F1 Score23.4 | 14 | |
| Time Series Attribution | FreqSum synthetic (test) | AUPRC0.67 | 13 | |
| Negative temporal attribution | Audio | Δŷc(2%)-0.16 | 13 | |
| Time Series Attribution | SeqComb-MV synthetic (test) | AUPRC61 | 13 | |
| Negative temporal attribution | EEG | Δŷc (2%)0.06 | 13 |
Showing 10 of 14 rows