Explaining Object Detectors via Collective Contribution of Pixels
About
Visual explanations for object detectors are crucial for enhancing their reliability. Object detectors identify and localize instances by assessing multiple visual features collectively. When generating explanations, overlooking these collective influences in detections may lead to missing compositional cues or capturing spurious correlations. However, existing methods typically focus solely on individual pixel contributions, neglecting the collective contribution of multiple pixels. To address this limitation, we propose a game-theoretic method based on Shapley values and interactions to explicitly capture both individual and collective pixel contributions. Our method provides explanations for both bounding box localization and class determination, highlighting regions crucial for detection. Extensive experiments demonstrate that the proposed method identifies important regions more accurately than state-of-the-art methods. The code is available at https://github.com/tttt-0814/VX-CODE
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Explanation | MS-COCO | -- | 30 | |
| Object Detection Explanation Faithfulness | MS-COCO | Insertion92.6 | 25 | |
| Faithfulness of identified regions | Pascal VOC | Insertion85 | 18 | |
| Object Detection | MS-COCO | Insertion Score92.2 | 11 | |
| Visual Explanation Faithfulness | MS-COCO Misclassification failure cases (test) | Insertion (Ins)73.8 | 9 | |
| Visual Explanation Faithfulness | MS-COCO Mislocalization failure cases (test) | Insertion Score78.7 | 9 | |
| Energy-based Pointing Game | MS-COCO | EPG (B)64.4 | 8 | |
| Pointing game | MS-COCO | PG (B)96.5 | 8 | |
| Interaction Score Analysis | COCO (300 instances) | Interaction Score5.1 | 7 | |
| Object Detection Explanation Faithfulness | COCO 100 detected instances | Insertion Score (Ins)93.3 | 7 |