Finding RELIEF: Shaping Reasoning Behavior without Reasoning Supervision via Belief Engineering

About

Large reasoning models (LRMs) have achieved remarkable success in complex problem-solving, yet they often suffer from computational redundancy or reasoning unfaithfulness. Current methods for shaping LRM behavior typically rely on reinforcement learning or fine-tuning with gold-standard reasoning traces, a paradigm that is both computationally expensive and difficult to scale. In this paper, we reveal that LRMs possess latent \textit{reasoning beliefs} that internally track their own reasoning traits, which can be captured through simple logit probing. Building upon this insight, we propose Reasoning Belief Engineering (RELIEF), a simple yet effective framework that shapes LRM behavior by aligning the model's self-concept with a target belief blueprint. Crucially, RELIEF completely bypasses the need for reasoning-trace supervision. It internalizes desired traits by fine-tuning on synthesized, self-reflective question-answering pairs that affirm the target belief. Extensive experiments on efficiency and faithfulness tasks demonstrate that RELIEF matches or outperforms behavior-supervised and preference-based baselines while requiring lower training costs. Further analysis validates that shifting a model's reasoning belief effectively shapes its actual behavior.

Chak Tou Leong, Dingwei Chen, Heming Xia, Qingyu Yin, Sunbowen Lee, Jian Wang, Wenjie Li• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH500 (test)	--	895
Mathematical Reasoning	AMC23 (test)	Pass@189.3	61
Mathematical Reasoning	Math Benchmarks Overall (test)	Pass@186.1	12
Mathematical Reasoning	GSM8K (test)	Pass@195	12

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord