Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps

About

Transformer-based language models significantly advanced the state-of-the-art in many linguistic tasks. As this revolution continues, the ability to explain model predictions has become a major area of interest for the NLP community. In this work, we present Gradient Self-Attention Maps (Grad-SAM) - a novel gradient-based method that analyzes self-attention units and identifies the input elements that explain the model's prediction the best. Extensive evaluations on various benchmarks show that Grad-SAM obtains significant improvements over state-of-the-art alternatives.

Oren Barkan, Edan Hauon, Avi Caciularu, Ori Katz, Itzik Malkiel, Omri Armstrong, Noam Koenigstein• 2022

Related benchmarks

TaskDatasetResultRank
Negative temporal attributionFordA
Δŷc (2%)-0.1
14
Hate Speech DetectionHateXplain (held-out)
F1 Score39.6
14
Grammatical AcceptabilityCoLA (held-out)
F1 Score35.6
14
Time Series AttributionSeqComb-UV synthetic (test)
AUPRC61
14
Temporal AttributionFORD-A
I(100)74.17
14
Sentiment AnalysisSST-2 (held-out)
F1 Score23.4
14
Time Series AttributionFreqSum synthetic (test)
AUPRC0.67
13
Negative temporal attributionAudio
Δŷc(2%)-0.16
13
Time Series AttributionSeqComb-MV synthetic (test)
AUPRC61
13
Negative temporal attributionEEG
Δŷc (2%)0.06
13
Showing 10 of 14 rows

Other info

Follow for update