LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions

About

Why do gradient-based explanations struggle with Transformers, and how can we improve them? We identify gradient flow imbalances in Transformers that violate FullGrad-completeness, a critical property for attribution faithfulness that CNNs naturally possess. To address this issue, we introduce LibraGrad -- a theoretically grounded post-hoc approach that corrects gradient imbalances through pruning and scaling of backward paths, without changing the forward pass or adding computational overhead. We evaluate LibraGrad using three metric families: Faithfulness, which quantifies prediction changes under perturbations of the most and least relevant features; Completeness Error, which measures attribution conservation relative to model outputs; and Segmentation AP, which assesses alignment with human perception. Extensive experiments across 8 architectures, 4 model sizes, and 4 datasets show that LibraGrad universally enhances gradient-based methods, outperforming existing white-box methods -- including Transformer-specific approaches -- across all metrics. We demonstrate superior qualitative results through two complementary evaluations: precise text-prompted region highlighting on CLIP models and accurate class discrimination between co-occurring animals on ImageNet-finetuned models -- two settings on which existing methods often struggle. LibraGrad is effective even on the attention-free MLP-Mixer architecture, indicating potential for extension to other modern architectures. Our code is freely available at https://github.com/NightMachinery/LibraGrad.

Faridoun Mehri, Mahdieh Soleymani Baghshah, Mohammad Taher Pilehvar (2) __INSTITUTION_3__ Sharif University of Technology, (2) Cardiff University)• 2024

Related benchmarks

Task	Dataset	Result
Localization	ImageNet	AUPR@157.57	70
Attribution Faithfulness	ImageNet-1K ILSVRC2012 (val)	Deletion Score60.8	40
Attribution Localization	ImageNet-1K ILSVRC2012 (val)	AUPR 156.53	40
Faithfulness Evaluation	ImageNet	Deletion Score49.19	30
Attribution Faithfulness Evaluation	ImageNet (test)	Deletion Score54.09	30
Attribution Faithfulness	ImageNet	Deletion Score36.94	30
Localization	ImageNet (val)	AUPR148.78	30
Faithfulness Evaluation	ImageNet (val)	--	24

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord