Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Effectively Controlling Reasoning Models through Thinking Intervention

About

Reasoning-enhanced large language models (LLMs) explicitly generate intermediate reasoning steps prior to generating final answers, helping the model excel in complex problem-solving. In this paper, we demonstrate that this emerging generation framework offers a unique opportunity for more fine-grained control over model behavior. We propose Thinking Intervention, a novel paradigm designed to explicitly guide the internal reasoning processes of LLMs by strategically inserting or revising specific thinking tokens. We find that the Thinking Intervention paradigm enhances the capabilities of reasoning models across a wide range of tasks, including instruction following on IFEval and Overthinking, instruction hierarchy on SEP, and safety alignment on XSTest and SorryBench. Our results demonstrate that Thinking Intervention significantly outperforms baseline prompting approaches, achieving up to 6.7% accuracy gains in instruction-following scenarios, 15.4% improvements in reasoning about instruction hierarchies, and a 40.0% increase in refusal rates for unsafe prompts using open-source DeepSeek R1 models. Overall, our work opens a promising new research avenue for controlling reasoning LLMs.

Tong Wu, Chong Xiang, Jiachen T. Wang, G. Edward Suh, Prateek Mittal• 2025

Related benchmarks

TaskDatasetResultRank
Safety EvaluationXSTest Safe
FC6
78
Safety EvaluationXSTest Unsafe
False Compliance Rate (FC)2
78
Mathematical ReasoningMATH500
Pass@193.5
77
Safety EvaluationAdvBench
Reasoning Harmfulness Rate0.00e+0
50
Jailbreak AttackPAIR
Harmful Score2
46
Safety EvaluationXSTest
F1 Score74
44
Safety EvaluationXSTest (combined)
F1 Score88
34
Safety EvaluationSorrybench
Reasoning Success Rate (FFR)15.9
32
Expert knowledge QAGPQA
Pass@160
29
Jailbreak Attack DefenseGCG--
24
Showing 10 of 27 rows

Other info

Follow for update