Enhancing Interpretability for Vision Models via Shapley Value Optimization
About
Deep neural networks have demonstrated remarkable performance across various domains, yet their decision-making processes remain opaque. Although many explanation methods are dedicated to bringing the obscurity of DNNs to light, they exhibit significant limitations: post-hoc explanation methods often struggle to faithfully reflect model behaviors, while self-explaining neural networks sacrifice performance and compatibility due to their specialized architectural designs. To address these challenges, we propose a novel self-explaining framework that integrates Shapley value estimation as an auxiliary task during training, which achieves two key advancements: 1) a fair allocation of the model prediction scores to image patches, ensuring explanations inherently align with the model's decision logic, and 2) enhanced interpretability with minor structural modifications, preserving model performance and compatibility. Extensive experiments on multiple benchmarks demonstrate that our method achieves state-of-the-art interpretability.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-Image Retrieval | Flickr30K | R@170.8 | 460 | |
| Image-to-Text Retrieval | Flickr30K | R@185.5 | 379 | |
| Image-to-Text Retrieval | MS-COCO (test) | R@121.83 | 99 | |
| Text-to-Image Retrieval | MS-COCO | R@567.3 | 79 | |
| Text-to-Image Retrieval | MS-COCO (test) | R@116.79 | 66 | |
| Image-to-Text Retrieval | MS-COCO | R@580.9 | 65 | |
| Explanation Faithfulness | ImageNet 2015 (test) | AOPC0.806 | 22 | |
| Segmentation | ImageNet segmentation | Pixel Accuracy85.78 | 22 | |
| Image Retrieval | MS-COCO 2014 (test) | Recall@1 (Del)11.75 | 9 | |
| Text Retrieval | MS-COCO 2014 (test) | Deliberate R@114.01 | 9 |