Universal Activation Verbalizer: A Unified Framework for Cross-Model Activation Explanation

About

Activation verbalization explains hidden representations in natural language, but existing methods are mostly limited to self-explanation, where each model explains only its own activations. We introduce Universal Activation Verbalizer (UAV), a framework that uses a shared decoder to explain activations from heterogeneous donor models. UAV learns a lightweight adapter that converts donor activations into soft tokens in decoder's embedding space, and further supports adapter-only transfer by reusing a frozen decoder-side LoRA while training only a new adapter for another donor. Across classification, fact retrieval, and gist summarization, UAV remains competitive with strong self-explanation baselines while enabling cross-model verbalization across model families and scales. Ablations show that decoder-side tuning mainly improves task behavior, whereas the adapter provides the activation-grounded factual and semantic information needed for faithful explanations.

Haiyan Zhao, Zirui He, Guanchu Wang, Ali Payani, Yingcong Li, Mengnan Du• 2026

Related benchmarks

Task	Dataset	Result
Fact Retrieval	Generative QA Protocol Fact Retrieval	ROUGE-L29.5	13
Fact Retrieval	Fact Retrieval	Tok-F129.4	13
Overall Generation Quality	Generative QA Protocol Overall	ROUGE-L28.6	13
Classification	Generative QA Protocol Classification	ROUGE-L0.289	13
Gist Summarization	Generative QA Protocol Gist Summarization	ROUGE-L0.281	13
Gist Summarization	Gist Summarization	Tok-F130.1	13
Classification	Classification task dataset	Tok-F130.9	13

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord