Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Model Whisper: Steering Vectors Unlock Large Language Models' Potential in Test-time

About

It is a critical challenge to efficiently unlock the powerful reasoning potential of Large Language Models (LLMs) for specific tasks or new distributions. Existing test-time adaptation methods often require tuning model parameters, which is not only computationally expensive but also risks degrading the model's pre-existing abilities.To address this, we introduce a lightweight component, Test-Time Steering Vectors (TTSV), which is prepended to the input while keeping the LLM's parameters entirely frozen. By optimizing the TTSV on test data to minimize the model's output entropy, we steer the model towards an internal state of higher confidence, activating its inherent abilities most relevant to the current task. TTSV is both lightweight and highly efficient to optimize, making it a true plug-and-play enhancement. Extensive experiments validate our approach's effectiveness on both base models and reasoning-enhanced models. For instance, on the MATH500 task, TTSV achieves a 45.88% relative performance gain on the Qwen2.5-Math-7B model and a 16.22% relative gain on the Qwen3-4B model. Furthermore, our approach exhibits robust generalization, with its steering vectors proving highly transferable across diverse tasks.

Xinyue Kang, Diwei Shi, Li Chen• 2025

Related benchmarks

TaskDatasetResultRank
Question AnsweringGPQA
Accuracy31.82
258
Mathematical ReasoningAIME 24
Accuracy23.33
113
Mathematical ReasoningMinerva Math
Accuracy25.4
100
ReasoningMATH 500
Accuracy (%)60.2
59
ReasoningGPQA
Accuracy47.98
38
ReasoningAIME24
Accuracy60
22
Mathematical ReasoningMATH500
Accuracy74.4
4
Mathematical ReasoningMinerva Math
Accuracy22.8
4
Mathematical ReasoningOlympiad Bench
Accuracy29.8
4
Mathematical ReasoningAMC23
Accuracy65
4
Showing 10 of 12 rows

Other info

Follow for update