Spherical Steering: Geometry-Aware Activation Rotation for Language Models
About
Inference-time steering has emerged as a promising paradigm for controlling language models (LMs) without the cost of retraining. However, standard approaches typically rely on activation addition, a geometric operation that inevitably alters the magnitude of hidden representations. This raises concerns about representation collapse and degradation of open-ended generation capabilities. In this work, we explore Spherical Steering, a training-free primitive that resolves this trade-off through activation rotation. Rather than shifting activations with a fixed vector, our method rotates them along a geodesic toward a target direction, guiding the activation toward the target concept while preserving the integrity of the signal. To further enhance adaptivity, we incorporate a confidence gate that dynamically modulates steering strength based on input uncertainty. Extensive experiments across multiple-choice benchmarks demonstrate that Spherical Steering significantly outperforms addition-based baselines (notably by +10% on TruthfulQA, COPA, and Storycloze), while simultaneously maintaining the model's general open-ended generation quality. This work highlights the value of geometric consistency, suggesting that norm-preserving rotation is a robust and effective primitive for precise inference-time control.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Question Answering | WinoGrande (WG) | Accuracy52.72 | 98 | |
| Multiple-Choice | TruthfulQA | MC1 Accuracy49.95 | 83 | |
| Story completion | StoryCloze | Accuracy89.08 | 65 | |
| Question Answering | COPA | Accuracy95 | 59 | |
| Multiple-choice Question Answering | TruthfulQA MC1 | MC1 Accuracy49.95 | 33 | |
| Open-ended generation | TruthfulQA Open-ended | True Score88.02 | 16 | |
| Multiple-choice Question Answering | MMLU | -- | 13 | |
| Multiple-choice Question Answering | BoolQ | MC Accuracy0.8294 | 5 |