Pulling Back the Curtain on Deep Networks
About
In linear models, visualizing a weight vector naturally reveals the model's preferred input direction, but extending this intuition to deep networks via gradients or gradient ascent often yields brittle or adversarial-looking features. We argue that deep networks are better understood as input-conditioned affine operators, whose natural adjoint action pulls a neuron's preferred direction back to input space. We further refine this representation by backward-only softening and iterative enhancement to reconstruct coherent local structures encoded by the target neuron. This provides a unifying perspective on previously disparate ideas such as SmoothGrad, B-cos-style alignment, and Feature Accentuation. The resulting Semantic Pullbacks (SP) generate perceptually aligned, class-conditional post-hoc explanations that emphasize semantically meaningful features, facilitate coherent counterfactual perturbations, and remain theoretically grounded. Across convolutional architectures (ResNet50, VGG) and transformer-based models (PVT), Semantic Pullbacks achieve the best overall trade-off across faithfulness, stability, and target-sensitivity benchmarks, while remaining general, computationally efficient, and readily integrable into existing deep learning pipelines.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Feature Attribution | ImageNet 1000 samples (val) | Infidelity5.384 | 24 | |
| Model Explanation Evaluation | ImageNet 1000 samples (val) | Infidelity5.616 | 12 | |
| Explainability | ImageNet 1000 samples (val) | Infidelity1.634 | 11 |