Semantic Optimal Transport for Sparse Autoencoder Feature Matching and Circuit Compression

About

Sparse autoencoders (SAEs) have become a central tool for interpreting language models. However, two key SAE analyses that remain difficult to scale are (1) matching semantically similar features across multi-layers and (2) compressing large feature circuits into interpretable supernodes. Although these have been treated as separate problems, we show that both are instances of a more fundamental challenge, which we frame as the estimation of semantic distances between SAE features that lie on different activation manifolds. We introduce a distributional framework for this problem, in which each feature is represented not by a single decoder vector like in the literature, but by an activation-weighted distribution over the hidden states that express it. By projecting these distributions into a shared reference space and comparing them with Wasserstein distance, our method provides a unified semantic metric for cross-layer feature comparison. We prove that our representation is invariant to activation rescaling, stable under perturbations, and recovers true matches under finite-sample margin conditions. Empirically, our method outperforms decoder-vector and LLM-based baselines and captures subtle functional distinctions between related features. Notably, our method compresses large feature circuits into interpretable supernodes automatically.

Tue M. Cao, Nguyen Do, My T. Thai• 2026

Related benchmarks

Task	Dataset	Result
Feature Matching	GPT2 Layer 5 match with Layer 11	LLM Eval1.56	6
Feature Matching	GPT2 Layer 0 match with Layer 11	LLM Eval Score1.39	6
Feature Matching	Gemma-2-2B Layer 12 match with Layer 25	LLM Evaluation Score1.83	6
Feature Matching	Gemma-2-2B Layer 0 match with Layer 25	LLM Eval1.83	6
Circuit Compression	Gemma-2-2B Digit Addition	Accuracy61.51	5
Circuit Compression	GPT2-small Digit Addition	Accuracy68.12	5
Feature Matching	GPT2 Layer 5 match with Layer 6	LLM Eval2.53	4
Feature Matching	Gemma-2-2B Layer 12 match with Layer 13	LLM Eval2.32	4

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord