Learn to Rank: Visual Attribution by Learning Importance Ranking
About
Interpreting the decisions of complex computer vision models is crucial to establish trust and accountability, especially in safety-critical domains. An established approach to interpretability is generating visual attribution maps that highlight regions of the input most relevant to the model's prediction. However, existing methods face a three-way trade-off. Propagation-based approaches are efficient, but they can be biased and architecture-specific. Meanwhile, perturbation-based methods are causally grounded, yet they are expensive and for vision transformers often yield coarse, patch-level explanations. Learning-based explainers are fast but usually optimize surrogate objectives or distill from heuristic teachers. We propose a learning scheme that instead optimizes deletion and insertion metrics directly. Since these metrics depend on non-differentiable sorting and ranking, we frame them as permutation learning and replace the hard sorting with a differentiable relaxation using Gumbel-Sinkhorn. This enables end-to-end training through attribution-guided perturbations of the target model. During inference, our method produces dense, pixel-level attributions in a single forward pass with optional, few-step gradient refinement. Our experiments demonstrate consistent quantitative improvements and sharper, boundary-aligned explanations, particularly for transformer-based vision models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Attribution Evaluation | ImageNet (val) | POS Score0.1907 | 18 | |
| Visual Attribution | ImageNet ILSVRC-2012 (val) | Deletion Score0.1126 | 10 | |
| Visual Attribution | ImageNet Predicted Class target ILSVRC 2012 (val) | Deletion Score0.1381 | 10 | |
| Visual Attribution | ImageNet (val) | Deletion Score0.1363 | 10 | |
| Visual Attribution | ImageNet Predicted Class (val) | Deletion Score17.41 | 7 | |
| Visual Attribution | ImageNet Ground Truth Class (val) | Deletion Score0.1624 | 7 |