Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Finding RELIEF: Shaping Reasoning Behavior without Reasoning Supervision via Belief Engineering

About

Large reasoning models (LRMs) have achieved remarkable success in complex problem-solving, yet they often suffer from computational redundancy or reasoning unfaithfulness. Current methods for shaping LRM behavior typically rely on reinforcement learning or fine-tuning with gold-standard reasoning traces, a paradigm that is both computationally expensive and difficult to scale. In this paper, we reveal that LRMs possess latent \textit{reasoning beliefs} that internally track their own reasoning traits, which can be captured through simple logit probing. Building upon this insight, we propose Reasoning Belief Engineering (RELIEF), a simple yet effective framework that shapes LRM behavior by aligning the model's self-concept with a target belief blueprint. Crucially, RELIEF completely bypasses the need for reasoning-trace supervision. It internalizes desired traits by fine-tuning on synthesized, self-reflective question-answering pairs that affirm the target belief. Extensive experiments on efficiency and faithfulness tasks demonstrate that RELIEF matches or outperforms behavior-supervised and preference-based baselines while requiring lower training costs. Further analysis validates that shifting a model's reasoning belief effectively shapes its actual behavior.

Chak Tou Leong, Dingwei Chen, Heming Xia, Qingyu Yin, Sunbowen Lee, Jian Wang, Wenjie Li• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH500 (test)--
381
Mathematical ReasoningAMC23 (test)
Pass@189.3
36
Mathematical ReasoningMath Benchmarks Overall (test)
Pass@186.1
12
Mathematical ReasoningGSM8K (test)
Pass@195
12
Showing 4 of 4 rows

Other info

Follow for update