COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics

About

Activation steering methods enable inference-time control of large language model (LLM) behavior without retraining, but current approaches face a fundamental trade-off: sample-efficient methods suboptimally capture steering signals from labeled examples, while methods that better extract these signals require hundreds to thousands of examples. We introduce COLD-Steer, a training-free framework that steers LLM activations by approximating the representational changes that would result from gradient descent on in-context examples. Our key insight is that the effect of fine-tuning on a small set of examples can be efficiently approximated at inference time without actual parameter updates. We formalize this through two complementary approaches: (i) a unit kernel approximation method that updates the activations directly using gradients with respect to them, normalized across examples, and (ii) a finite-difference approximation requiring only two forward passes regardless of example count. Experiments across a variety of steering tasks and benchmarks demonstrate that COLD-Steer achieves upto 95% steering effectiveness while using 50 times fewer samples compared to the best baseline. COLD-Steer facilitates accommodating diverse perspectives without extensive demonstration data, which we validate through our experiments on pluralistic alignment tasks. Our framework opens new possibilities for adaptive, context-aware model control that can flexibly address varying loss-driven human preferences through principled approximation of learning dynamics rather than specialized training procedures.

Kartik Sharma, Rakshit S. Trivedi• 2026

Related benchmarks

Task	Dataset	Result
Behavior selection	CAA (50 random samples)	Accuracy (coordinate-ais, pair)98	22
Behavior selection	BiPO (test)	Hallucination Pair Accuracy64	14
Hallucination Steering	CAA	Runtime31.14	13
Open-ended behavior generation	CAA	CoAIS Score4.36	10
Hallucination	CAA	Accuracy (pair)88	8
Behavior selection	CAA behaviors	COAIS Score4.64	5
Behavior Generation	BiPO	Hallucination Score1.62	5
Behavior Generation	CAA behaviors Qwen-2.5-7B-Instruct (test)	CoAIS26	5
Distributional pluralistic alignment	OpinionsQA	Democrat KL Divergence1	4

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord