Pulling Back the Curtain on Deep Networks

About

In linear models, visualizing a weight vector naturally reveals the model's preferred input direction, but extending this intuition to deep networks via gradients or gradient ascent often yields brittle or adversarial-looking features. We argue that deep networks are better understood as input-conditioned affine operators, whose natural adjoint action pulls a neuron's preferred direction back to input space. We further refine this representation by backward-only softening and iterative enhancement to reconstruct coherent local structures encoded by the target neuron. This provides a unifying perspective on previously disparate ideas such as SmoothGrad, B-cos-style alignment, and Feature Accentuation. The resulting Semantic Pullbacks (SP) generate perceptually aligned, class-conditional post-hoc explanations that emphasize semantically meaningful features, facilitate coherent counterfactual perturbations, and remain theoretically grounded. Across convolutional architectures (ResNet50, VGG) and transformer-based models (PVT), Semantic Pullbacks achieve the best overall trade-off across faithfulness, stability, and target-sensitivity benchmarks, while remaining general, computationally efficient, and readily integrable into existing deep learning pipelines.

Maciej Satkiewicz, Roberto Corizzo, Marcin Pietro\'n• 2025

Related benchmarks

Task	Dataset	Result
Feature Attribution	ImageNet 1000 samples (val)	Infidelity5.384	24
Model Explanation Evaluation	ImageNet 1000 samples (val)	Infidelity5.616	12
Explainability	ImageNet 1000 samples (val)	Infidelity1.634	11

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord