SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders

About

Sparse Autoencoders (SAEs) have become an important tool in mechanistic interpretability, helping to analyze internal representations in both Large Language Models (LLMs) and Vision Transformers (ViTs). By decomposing polysemantic activations into sparse sets of monosemantic features, SAEs aim to translate neural network computations into human-understandable concepts. However, common architectures such as TopK SAEs rely on a fixed sparsity level. They enforce the same number of active features (K) across all inputs, ignoring the varying complexity of real-world data. Natural data often lies on manifolds with varying local intrinsic dimensionality, meaning the number of relevant factors can change significantly across samples. This suggests that a fixed sparsity level is not optimal. Simple inputs may require only a few features, while more complex ones need more expressive representations. Using a constant K can therefore introduce noise in simple cases or miss important structure in more complex ones. To address this issue, we propose SoftSAE, a sparse autoencoder with a Dynamic Top-K selection mechanism. Our method uses a differentiable Soft Top-K operator to learn an input-dependent sparsity level k. This allows the model to adjust the number of active features based on the complexity of each input. As a result, the representation better matches the structure of the data, and the explanation length reflects the amount of information in the input. Experimental results confirm that SoftSAE not only finds meaningful features, but also selects the right number of features for each concept. The source code is available at: https://github.com/St0pien/SoftSAE.

Jakub St\k{e}pie\'n, Marcin Mazur, Jacek Tabor, Przemys{\l}aw Spurek• 2026

Related benchmarks

Task	Dataset	Result	Rank
Embedding Reconstruction	CLIP vision embeddings CC3M and ImageNet	L0 Error62.562		24
Sparse Autoencoder Evaluation	Gemma-2-2B activations	L0 Count302.4		20

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord