Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DISCO: Disentangled Communication Steering for Large Language Models

About

A variety of recent methods guide large language model outputs via the inference-time addition of steering vectors to residual-stream or attention-head representations. In contrast, we propose to inject steering vectors directly into the query and value representation spaces within attention heads. We provide evidence that a greater portion of these spaces exhibit high linear discriminability of concepts --a key property motivating the use of steering vectors-- than attention head outputs. We analytically characterize the effect of our method, which we term DISentangled COmmunication (DISCO) Steering, on attention head outputs. Our analysis reveals that DISCO disentangles a strong but underutilized baseline, steering attention inputs, which implicitly modifies queries and values in a rigid manner. In contrast, DISCO's direct modulation of these components enables more granular control. We find that DISCO achieves superior performance over a number of steering vector baselines across multiple datasets on LLaMA 3.1 8B and Gemma 2 9B, with steering efficacy scoring up to 19.1% higher than the runner-up. Our results support the conclusion that the query and value spaces are powerful building blocks for steering vector methods.

Max Torop, Aria Masoomi, Masih Eskandar, Jennifer Dy• 2025

Related benchmarks

TaskDatasetResultRank
Truthfulness SteeringTruthfulQA
T×I Score78.66
28
Instruction FollowingIFBench
Accuracy11.5
18
Cognitive style steeringBloom's Taxonomy Phi generations (test)
Remember Hit Rate3.9
14
Model SteeringSteering Evaluation Suite Power, Wealth, Corr, TQA Gemma-2-9B-IT (test)
Power2.61
10
Question AnsweringTruthfulQA
True*Info Score (TQA)81.6
10
SteeringPower
LLM Judge Score2.91
10
SteeringWealth
LLM Judge Score2.25
10
SteeringCorrigibility
LLM Judge Score3.22
10
Mathematical ReasoningGSM8K
Accuracy22.5
10
Question AnsweringARC Challenge
Accuracy34.2
10
Showing 10 of 14 rows

Other info

Follow for update