AGOP as Explanation: From Feature Learning to Per-Sample Attribution in Image Classifiers

About

The Average Gradient Outer Product (AGOP) governs feature learning in neural networks: the Neural Feature Ansatz states that weight Gram matrices at each layer align with the corresponding AGOP matrices computed over the training distribution. We ask a complementary question: can this same quantity serve as a post-hoc attribution method for explaining individual predictions? We introduce AGOP-Weighted: a novel attribution method that multiplies the per-sample gradient by sqrt(diag(M) / max diag(M)), a training-distribution prior that suppresses gradient noise and amplifies consistently important pixels -- a combination not present in any prior attribution method. We formalise two companion variants -- AGOP-Local (per-sample gradient, equivalent to VanillaGrad) and AGOP-Global (diag(M) directly as a zero-cost saliency map) -- and implement an efficient training-time accumulation hook; AGOP-Global then requires zero inference cost (disk lookup) while AGOP-Weighted requires only a single gradient pass. We conduct the first rigorous comparison of AGOP attribution against Integrated Gradients (IG), SmoothGrad, GradCAM, and VanillaGrad across two benchmarks with pixel-level ground truth: (i) the synthetic XAI-TRIS benchmark (four classification scenarios, 8x8 images, CNN8by8) and (ii) the photorealistic CLEVR-XAI benchmark (ResNet-18 fine-tuned from ImageNet). AGOP-Weighted achieves 44% higher mIoU than IG on linear tasks; AGOP-Global achieves 7x higher mIoU than IG on multiplicative tasks (where IG falls below random) at zero inference cost. Both findings generalise to ResNet-18 on CLEVR-XAI (+18% and +37% respectively). We further show that GradCAM fails on small-resolution images due to spatial resolution collapse, and that diag(M) quality improves monotonically throughout training even after classification accuracy has plateaued.

Raj Kiran Gupta Katakam• 2026

Related benchmarks

Task	Dataset	Result
Attribution	XAI-TRIS Scenario 2 (Multiplicative/uncorrelated)	PG50.6	9
Feature Attribution	XAI-TRIS Scenario 4 - XOR	PG0.497	9
Attribution Map Evaluation	XAI-TRIS Scenario 3 (Translations+Rotations uncorrelated)	PG77.4	9
Attribution Performance Evaluation	XAI-TRIS Scenario 1 (Linear uncorrelated)	PG66.7	9
Post-hoc Attribution	CLEVR-XAI (val)	PG28.2	9

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord