Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Predicting Where Steering Vectors Succeed

About

Steering vectors work for some concepts and layers but fail for others, and practitioners have no way to predict which setting applies before running an intervention. We introduce the Linear Accessibility Profile (LAP), a per-layer diagnostic that repurposes the logit lens as a predictor of steering vector effectiveness. The key measure, $A_{\mathrm{lin}}$, applies the model's unembedding matrix to intermediate hidden states, requiring no training. Across 24 controlled binary concept families on five models (Pythia-2.8B to Llama-8B), peak $A_{\mathrm{lin}}$ predicts steering effectiveness at $\rho = +0.86$ to $+0.91$ and layer selection at $\rho = +0.63$ to $+0.92$. A three-regime framework explains when difference-of-means steering works, when nonlinear methods are needed, and when no method can work. An entity-steering demo confirms the prediction end-to-end: steering at the LAP-recommended layer redirects completions on Gemma-2-2B and OLMo-2-1B-Instruct, while the middle layer (the standard heuristic) has no effect on either model.

Jayadev Billa• 2026

Related benchmarks

TaskDatasetResultRank
Steering effectiveness correlation analysisGemma-2-2B concept families (across 26 layers)--
6
Steerability Correlation AnalysisBinary Concept Families All
Correlation Coefficient (rho)0.93
5
Steerability Correlation AnalysisBinary Concept Families Controlled
Correlation (ρ)0.91
5
Steerability Correlation AnalysisBinary Concept Families Controlled, Alin > 0.05
Correlation Coefficient (rho)0.9
5
Steerability Correlation AnalysisBinary Concept Families Controlled, Alin > 0.1
Correlation Coefficient (rho)0.86
5
Linear Concept AccessibilityWord transform
Peak Alin0.715
4
Linear Concept AccessibilityAnalogy
Peak Alin0.765
4
Linear Concept Accessibility and SteeringArithmetic
Peak Alin0.995
4
Linear Concept Accessibility and SteeringGeography
Peak Alin0.68
4
Linear Concept Accessibility and SteeringSequence
Peak Alin0.82
4
Showing 10 of 20 rows

Other info

Follow for update