Identifying and Transferring Reasoning-Critical Neurons: Improving LLM Inference Reliability via Activation Steering
About
Despite the strong reasoning capabilities of recent large language models (LLMs), achieving reliable performance on challenging tasks often requires post-training or computationally expensive sampling strategies, limiting their practical efficiency. In this work, we first show that a small subset of neurons in LLMs exhibits strong predictive correlations with reasoning correctness. Based on this observation, we propose AdaRAS (Adaptive Reasoning Activation Steering), a lightweight test-time framework that improves reasoning reliability by selectively intervening on neuron activations. AdaRAS identifies Reasoning-Critical Neurons (RCNs) via a polarity-aware mean-difference criterion and adaptively steers their activations during inference, enhancing incorrect reasoning traces while avoiding degradation on already-correct cases. Experiments on 10 mathematics and coding benchmarks demonstrate consistent improvements, including over 13% gains on AIME-24 and AIME-25. Moreover, AdaRAS exhibits strong transferability across datasets and scalability to stronger models, outperforming post-training methods without additional training or sampling cost.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Math | GSM8K | Accuracy0.8908 | 87 | |
| Mathematical Problem Solving | AIME 25 | Accuracy54.55 | 54 | |
| Code | HumanEval | HumanEval Accuracy79.19 | 50 | |
| Coding | MBPP | Accuracy72.22 | 31 | |
| Math | MATH 500 | Accuracy86.4 | 25 | |
| Code | HumanEval+ | Accuracy73.15 | 22 | |
| Math | AIME24 | Accuracy60.87 | 20 | |
| Code | MBPP+ | Accuracy60.58 | 6 | |
| Math | AIME-Extend | Accuracy52.67 | 6 | |
| Math | AMC-12 | Accuracy70.33 | 6 |