Spherical Steering: Geometry-Aware Activation Rotation for Language Models

About

Inference-time steering offers a promising way to control language models (LMs) without retraining. However, standard approaches typically rely on activation addition, which inevitably alters the hidden-state magnitudes raising concerns about representation collapse and degraded open-ended generation. In this work, we explore Spherical Steering, a training-free primitive that resolves this trade-off through activation rotation. Rather than shifting activations with a fixed vector, our method rotates them along a geodesic toward a target direction, preserving signal integrity while steering toward the target concept. To further enhance adaptivity, we incorporate a confidence gate that dynamically modulates steering strength based on input uncertainty. Extensive experiments across multiple-choice benchmarks demonstrate that Spherical Steering significantly outperforms addition-based baselines (notably by +10% on TruthfulQA, COPA, and Storycloze), while simultaneously maintaining the model's general open-ended generation quality. This work highlights the value of geometric consistency, suggesting that norm-preserving rotation is a robust and effective primitive for precise inference-time control. The code is available at: https://github.com/chili-lab/Spherical-Steering.

Zejia You, Chunyuan Deng, Hanjie Chen• 2026

Related benchmarks

Task	Dataset	Result
Question Answering	WinoGrande (WG)	Accuracy52.72	138
Multiple-Choice	TruthfulQA	MC1 Accuracy49.95	83
Story completion	StoryCloze	Accuracy89.08	80
Question Answering	COPA	Accuracy95	59
Multiple-choice Question Answering	MMLU	MMLU Accuracy (Overall)62.05	52
Multiple-choice Question Answering	BoolQ	MC Accuracy0.8294	46
Multiple-choice Question Answering	TruthfulQA MC1	MC1 Accuracy49.95	39
Open-ended generation	TruthfulQA Open-ended	True Score88.02	16

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord