Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Pulling Back the Curtain on Deep Networks

About

In linear models, visualizing a weight vector naturally reveals the model's preferred input direction, but extending this intuition to deep networks via gradients or gradient ascent often yields brittle or adversarial-looking features. We argue that deep networks are better understood as input-conditioned affine operators, whose natural adjoint action pulls a neuron's preferred direction back to input space. We further refine this representation by backward-only softening and iterative enhancement to reconstruct coherent local structures encoded by the target neuron. This provides a unifying perspective on previously disparate ideas such as SmoothGrad, B-cos-style alignment, and Feature Accentuation. The resulting Semantic Pullbacks (SP) generate perceptually aligned, class-conditional post-hoc explanations that emphasize semantically meaningful features, facilitate coherent counterfactual perturbations, and remain theoretically grounded. Across convolutional architectures (ResNet50, VGG) and transformer-based models (PVT), Semantic Pullbacks achieve the best overall trade-off across faithfulness, stability, and target-sensitivity benchmarks, while remaining general, computationally efficient, and readily integrable into existing deep learning pipelines.

Maciej Satkiewicz, Roberto Corizzo, Marcin Pietro\'n• 2025

Related benchmarks

TaskDatasetResultRank
Feature AttributionImageNet 1000 samples (val)
Infidelity5.384
24
Model Explanation EvaluationImageNet 1000 samples (val)
Infidelity5.616
12
ExplainabilityImageNet 1000 samples (val)
Infidelity1.634
11
Showing 3 of 3 rows

Other info

Follow for update