Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

On the Provable Performance Guarantee of Efficient Reasoning Models

About

Large reasoning models (LRMs) have achieved remarkable progress in complex problem-solving tasks. Despite this success, LRMs typically suffer from high computational costs during deployment, highlighting a need for efficient inference. A practical direction of efficiency improvement is to switch the LRM between thinking and non-thinking modes dynamically. However, such approaches often introduce additional reasoning errors and lack statistical guarantees for the performance loss, which are critical for high-stakes applications. In this work, we propose Probably Approximately Correct (PAC) reasoning that controls the performance loss under the user-specified tolerance. Specifically, we construct an upper confidence bound on the performance loss and determine a threshold for switching to the non-thinking model. Theoretically, using the threshold to switch between the thinking and non-thinking modes ensures bounded performance loss in a distribution-free manner. Our comprehensive experiments on reasoning benchmarks show that the proposed method can save computational budgets and control the user-specified performance loss.

Hao Zeng, Jianguo Huang, Bingyi Jing, Hongxin Wei, Bo An• 2025

Related benchmarks

TaskDatasetResultRank
Open-domain taskArena Hard
Error (%)8.63
12
Open-domain taskArena-Hard (test)
Error13.16
12
Scientific ReasoningGPQA (test)
Error Rate (%)11.44
6
ReasoningGPQA
Error Rate (%)10.86
6
Logical reasoningZebraLogic (test)
Error (%)17.05
6
Mathematical ReasoningMATH-500 (test)
Error Rate36.52
6
ReasoningMATH 500
Error Rate3.08
6
ReasoningZebraLogic
Error Rate (%)4.76
6
Showing 8 of 8 rows

Other info

Follow for update