Universal Activation Verbalizer: A Unified Framework for Cross-Model Activation Explanation
About
Activation verbalization explains hidden representations in natural language, but existing methods are mostly limited to self-explanation, where each model explains only its own activations. We introduce Universal Activation Verbalizer (UAV), a framework that uses a shared decoder to explain activations from heterogeneous donor models. UAV learns a lightweight adapter that converts donor activations into soft tokens in decoder's embedding space, and further supports adapter-only transfer by reusing a frozen decoder-side LoRA while training only a new adapter for another donor. Across classification, fact retrieval, and gist summarization, UAV remains competitive with strong self-explanation baselines while enabling cross-model verbalization across model families and scales. Ablations show that decoder-side tuning mainly improves task behavior, whereas the adapter provides the activation-grounded factual and semantic information needed for faithful explanations.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Fact Retrieval | Generative QA Protocol Fact Retrieval | ROUGE-L29.5 | 13 | |
| Fact Retrieval | Fact Retrieval | Tok-F129.4 | 13 | |
| Overall Generation Quality | Generative QA Protocol Overall | ROUGE-L28.6 | 13 | |
| Classification | Generative QA Protocol Classification | ROUGE-L0.289 | 13 | |
| Gist Summarization | Generative QA Protocol Gist Summarization | ROUGE-L0.281 | 13 | |
| Gist Summarization | Gist Summarization | Tok-F130.1 | 13 | |
| Classification | Classification task dataset | Tok-F130.9 | 13 |