Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment

About

We present Universal Sparse Autoencoders (USAEs), a framework for uncovering and aligning interpretable concepts spanning multiple pretrained deep neural networks. Unlike existing concept-based interpretability methods, which focus on a single model, USAEs jointly learn a universal concept space that can reconstruct and interpret the internal activations of multiple models at once. Our core insight is to train a single, overcomplete sparse autoencoder (SAE) that ingests activations from any model and decodes them to approximate the activations of any other model under consideration. By optimizing a shared objective, the learned dictionary captures common factors of variation-concepts-across different tasks, architectures, and datasets. We show that USAEs discover semantically coherent and important universal concepts across vision models; ranging from low-level features (e.g., colors and textures) to higher-level structures (e.g., parts and objects). Overall, USAEs provide a powerful new method for interpretable cross-model analysis and offers novel applications, such as coordinated activation maximization, that open avenues for deeper insights in multi-model AI systems

Harrish Thasarathan, Julian Forsyth, Thomas Fel, Matthew Kowal, Konstantinos G. Derpanis• 2025

Related benchmarks

Task	Dataset	Result
Label Purity	Open Images	Label Purity64.26	30
Feature Reconstruction	Open Images clip_txt Original Target (test)	R^2 (variance-weighted)0.616	9
Feature Reconstruction	dino Original Target Open Images (test)	R^2 (variance-weighted)0.111	9
Feature Reconstruction	clip_img Original Target Open Images (test)	Variance-Weighted R^20.506	9
Concept recovery probing (1D logistic probe)	Open Images 432 binary tasks (test)	CLIP Image Score0.6372	5
Concept alignment	Open Images hierarchy depth 5	Mean Jaccard Similarity0.2166	5
Concept alignment	ImageNet	--	3
Concept alignment	DTD	--	3
Concept alignment	CelebA	--	3

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord