Quantifying Context Mixing in Transformers

About

Self-attention weights and their transformed variants have been the main source of information for analyzing token-to-token interactions in Transformer-based models. But despite their ease of interpretation, these weights are not faithful to the models' decisions as they are only one part of an encoder, and other components in the encoder layer can have considerable impact on information mixing in the output representations. In this work, by expanding the scope of analysis to the whole encoder block, we propose Value Zeroing, a novel context mixing score customized for Transformers that provides us with a deeper understanding of how information is mixed at each encoder layer. We demonstrate the superiority of our context mixing score over other analysis methods through a series of complementary evaluations with different viewpoints based on linguistically informed rationales, probing, and faithfulness analysis.

Hosein Mohebbi, Willem Zuidema, Grzegorz Chrupa{\l}a, Afra Alishahi• 2023

Related benchmarks

Task	Dataset	Result
Faithfulness Evaluation	TellMeWhy	AUC π-Soft-NS0.19	67
Faithfulness Evaluation	WikiBio	AUC π-Soft-NS0.218	67
Faithfulness Evaluation	SST2	AUC π-Soft (NS)0.325	27
Faithfulness Evaluation	IMDB	AUC π-Soft-NS39.8	27
Faithfulness Evaluation	BoolQ	AUC π-Soft-NS15.8	27
Sentiment Classification	IMDB	Deletion Rate25.1	20
Sentiment Classification	SST2	Deletion Robustness0.3755	20

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord