On the Provable Performance Guarantee of Efficient Reasoning Models

About

Large reasoning models (LRMs) have achieved remarkable progress in complex problem-solving tasks. Despite this success, LRMs typically suffer from high computational costs during deployment, highlighting a need for efficient inference. A practical direction of efficiency improvement is to switch the LRM between thinking and non-thinking modes dynamically. However, such approaches often introduce additional reasoning errors and lack statistical guarantees for the performance loss, which are critical for high-stakes applications. In this work, we propose Probably Approximately Correct (PAC) reasoning that controls the performance loss under the user-specified tolerance. Specifically, we construct an upper confidence bound on the performance loss and determine a threshold for switching to the non-thinking model. Theoretically, using the threshold to switch between the thinking and non-thinking modes ensures bounded performance loss in a distribution-free manner. Our comprehensive experiments on reasoning benchmarks show that the proposed method can save computational budgets and control the user-specified performance loss.

Hao Zeng, Jianguo Huang, Bingyi Jing, Hongxin Wei, Bo An• 2025

Related benchmarks

Task	Dataset	Result
Logical reasoning	ZebraLogic (test)	--	90
Mathematical Reasoning	MATH-500 (test)	--	46
Open-domain task	Arena Hard	Error (%)8.63	12
Open-domain task	Arena-Hard (test)	Error13.16	12
Scientific Reasoning	GPQA (test)	Error Rate (%)11.44	6
Reasoning	GPQA	Error Rate (%)10.86	6
Reasoning	MATH 500	Error Rate3.08	6
Reasoning	ZebraLogic	Error Rate (%)4.76	6

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord