Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

On the Provable Performance Guarantee of Efficient Reasoning Models

About

Large reasoning models (LRMs) have achieved remarkable progress in complex problem-solving tasks. Despite this success, LRMs typically suffer from high computational costs during deployment, highlighting a need for efficient inference. A practical direction of efficiency improvement is to switch the LRM between thinking and non-thinking modes dynamically. However, such approaches often introduce additional reasoning errors and lack statistical guarantees for the performance loss, which are critical for high-stakes applications. In this work, we propose Probably Approximately Correct (PAC) reasoning that controls the performance loss under the user-specified tolerance. Specifically, we construct an upper confidence bound on the performance loss and determine a threshold for switching to the non-thinking model. Theoretically, using the threshold to switch between the thinking and non-thinking modes ensures bounded performance loss in a distribution-free manner. Our comprehensive experiments on reasoning benchmarks show that the proposed method can save computational budgets and control the user-specified performance loss.

Hao Zeng, Jianguo Huang, Bingyi Jing, Hongxin Wei, Bo An• 2025

Related benchmarks

TaskDatasetResultRank
Open-domain taskArena Hard
Error (%)8.63
12
Open-domain taskArena-Hard (test)
Error13.16
12
Scientific ReasoningGPQA (test)
Error Rate (%)11.44
6
ReasoningGPQA
Error Rate (%)10.86
6
Logical reasoningZebraLogic (test)
Error (%)17.05
6
Mathematical ReasoningMATH-500 (test)
Error Rate36.52
6
ReasoningMATH 500
Error Rate3.08
6
ReasoningZebraLogic
Error Rate (%)4.76
6
Showing 8 of 8 rows

Other info

Follow for update