Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps

About

Transformer-based language models significantly advanced the state-of-the-art in many linguistic tasks. As this revolution continues, the ability to explain model predictions has become a major area of interest for the NLP community. In this work, we present Gradient Self-Attention Maps (Grad-SAM) - a novel gradient-based method that analyzes self-attention units and identifies the input elements that explain the model's prediction the best. Extensive evaluations on various benchmarks show that Grad-SAM obtains significant improvements over state-of-the-art alternatives.

Oren Barkan, Edan Hauon, Avi Caciularu, Ori Katz, Itzik Malkiel, Omri Armstrong, Noam Koenigstein• 2022

Related benchmarks

Task	Dataset	Result
Negative temporal attribution	FordA	Δŷc (2%)-0.1	14
Hate Speech Detection	HateXplain (held-out)	F1 Score39.6	14
Grammatical Acceptability	CoLA (held-out)	F1 Score35.6	14
Time Series Attribution	SeqComb-UV synthetic (test)	AUPRC61	14
Temporal Attribution	FORD-A	I(100)74.17	14
Sentiment Analysis	SST-2 (held-out)	F1 Score23.4	14
Time Series Attribution	FreqSum synthetic (test)	AUPRC0.67	13
Negative temporal attribution	Audio	Δŷc(2%)-0.16	13
Time Series Attribution	SeqComb-MV synthetic (test)	AUPRC61	13
Negative temporal attribution	EEG	Δŷc (2%)0.06	13

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord