Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions

About

Why do gradient-based explanations struggle with Transformers, and how can we improve them? We identify gradient flow imbalances in Transformers that violate FullGrad-completeness, a critical property for attribution faithfulness that CNNs naturally possess. To address this issue, we introduce LibraGrad -- a theoretically grounded post-hoc approach that corrects gradient imbalances through pruning and scaling of backward paths, without changing the forward pass or adding computational overhead. We evaluate LibraGrad using three metric families: Faithfulness, which quantifies prediction changes under perturbations of the most and least relevant features; Completeness Error, which measures attribution conservation relative to model outputs; and Segmentation AP, which assesses alignment with human perception. Extensive experiments across 8 architectures, 4 model sizes, and 4 datasets show that LibraGrad universally enhances gradient-based methods, outperforming existing white-box methods -- including Transformer-specific approaches -- across all metrics. We demonstrate superior qualitative results through two complementary evaluations: precise text-prompted region highlighting on CLIP models and accurate class discrimination between co-occurring animals on ImageNet-finetuned models -- two settings on which existing methods often struggle. LibraGrad is effective even on the attention-free MLP-Mixer architecture, indicating potential for extension to other modern architectures. Our code is freely available at https://github.com/NightMachinery/LibraGrad.

Faridoun Mehri, Mahdieh Soleymani Baghshah, Mohammad Taher Pilehvar (2) __INSTITUTION_3__ Sharif University of Technology, (2) Cardiff University)• 2024

Related benchmarks

TaskDatasetResultRank
LocalizationImageNet
AUPR@157.57
70
Attribution FaithfulnessImageNet-1K ILSVRC2012 (val)
Deletion Score60.8
40
Attribution LocalizationImageNet-1K ILSVRC2012 (val)
AUPR 156.53
40
Faithfulness EvaluationImageNet
Deletion Score49.19
30
Attribution Faithfulness EvaluationImageNet (test)
Deletion Score54.09
30
Attribution FaithfulnessImageNet
Deletion Score36.94
30
LocalizationImageNet (val)
AUPR148.78
30
Faithfulness EvaluationImageNet (val)--
24
Showing 8 of 8 rows

Other info

Follow for update