Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Universal Activation Verbalizer: A Unified Framework for Cross-Model Activation Explanation

About

Activation verbalization explains hidden representations in natural language, but existing methods are mostly limited to self-explanation, where each model explains only its own activations. We introduce Universal Activation Verbalizer (UAV), a framework that uses a shared decoder to explain activations from heterogeneous donor models. UAV learns a lightweight adapter that converts donor activations into soft tokens in decoder's embedding space, and further supports adapter-only transfer by reusing a frozen decoder-side LoRA while training only a new adapter for another donor. Across classification, fact retrieval, and gist summarization, UAV remains competitive with strong self-explanation baselines while enabling cross-model verbalization across model families and scales. Ablations show that decoder-side tuning mainly improves task behavior, whereas the adapter provides the activation-grounded factual and semantic information needed for faithful explanations.

Haiyan Zhao, Zirui He, Guanchu Wang, Ali Payani, Yingcong Li, Mengnan Du• 2026

Related benchmarks

TaskDatasetResultRank
Fact RetrievalGenerative QA Protocol Fact Retrieval
ROUGE-L29.5
13
Fact RetrievalFact Retrieval
Tok-F129.4
13
Overall Generation QualityGenerative QA Protocol Overall
ROUGE-L28.6
13
ClassificationGenerative QA Protocol Classification
ROUGE-L0.289
13
Gist SummarizationGenerative QA Protocol Gist Summarization
ROUGE-L0.281
13
Gist SummarizationGist Summarization
Tok-F130.1
13
ClassificationClassification task dataset
Tok-F130.9
13
Showing 7 of 7 rows

Other info

Follow for update