i-WiViG: Interpretable Window Vision GNN

About

Vision graph neural networks have emerged as a popular approach for modeling the global and spatial context for image recognition. However, a significant drawback of these methods is that they do not offer an inherent interpretation of the relevant spatial interactions for their prediction. We address this problem by introducing i-WiViG, an approach that enables interpretable model reasoning based on a sparse subgraph in the image. i-WiViG is based on two key postulates: 1) constraining the graph nodes' receptive field to disjoint local windows in the image, and 2) an inherently interpretable graph bottleneck with learnable sparse attention that identifies the relevant interactions among the local image windows. We evaluate our approach on both scene classification and regression tasks using natural and remote sensing imagery. Our results, supported by quantitative and qualitative evidence, demonstrate that the method delivers semantic, intuitive, and faithful explanations through the identified subgraphs. Furthermore, extensive experiments confirm that it achieves competitive performance to its black-box counterparts, even on datasets exhibiting strong texture bias. The implementation is available on https://github.com/zhu-xlab/i-WiViG.

Ivica Obadic, Dmitry Kangin, Adrian H\"ohl, Dario Oliveira, Plamen P Angelov, Xiao Xiang Zhu• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	SUN397 (test)	Top-1 Accuracy58	251
Scene recognition	SUN 397 (test)	Top-1 Accuracy58	35
Scene Classification	RESISC-45 (test)	OA92	32
Regression	Liveability (test)	R²0.47	9

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord