Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DAVE: Distribution-aware Attribution via ViT Gradient Decomposition

About

Vision Transformers (ViTs) have become a dominant architecture in computer vision, yet producing stable and high-resolution attribution maps for these models remains challenging. Architectural components such as patch embeddings and attention routing often introduce structured artifacts in pixel-level explanations, causing many existing methods to rely on coarse patch-level attributions. We introduce DAVE \textit{(\underline{D}istribution-aware \underline{A}ttribution via \underline{V}iT Gradient D\underline{E}composition)}, a mathematically grounded attribution method for ViTs based on a structured decomposition of the input gradient. By exploiting architectural properties of ViTs, DAVE isolates locally equivariant and stable components of the effective input--output mapping. It separates these from architecture-induced artifacts and other sources of instability.

Adam Wr\'obel, Siddhartha Gairola, Jacek Tabor, Bernt Schiele, Bartosz Zieli\'nski, Dawid Rymarczyk• 2026

Related benchmarks

TaskDatasetResultRank
LocalizationImageNet-1k (val)--
79
AttributionImageNet-S (val)
AL0.48
17
Image Attribution EvaluationPASCAL VOC 2012 (test)
AL0.312
15
Attribution LocalizationImageNet-1k (val)
Grid PG88.43
10
Showing 4 of 4 rows

Other info

Follow for update