On the Provable Performance Guarantee of Efficient Reasoning Models
About
Large reasoning models (LRMs) have achieved remarkable progress in complex problem-solving tasks. Despite this success, LRMs typically suffer from high computational costs during deployment, highlighting a need for efficient inference. A practical direction of efficiency improvement is to switch the LRM between thinking and non-thinking modes dynamically. However, such approaches often introduce additional reasoning errors and lack statistical guarantees for the performance loss, which are critical for high-stakes applications. In this work, we propose Probably Approximately Correct (PAC) reasoning that controls the performance loss under the user-specified tolerance. Specifically, we construct an upper confidence bound on the performance loss and determine a threshold for switching to the non-thinking model. Theoretically, using the threshold to switch between the thinking and non-thinking modes ensures bounded performance loss in a distribution-free manner. Our comprehensive experiments on reasoning benchmarks show that the proposed method can save computational budgets and control the user-specified performance loss.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Open-domain task | Arena Hard | Error (%)8.63 | 12 | |
| Open-domain task | Arena-Hard (test) | Error13.16 | 12 | |
| Scientific Reasoning | GPQA (test) | Error Rate (%)11.44 | 6 | |
| Reasoning | GPQA | Error Rate (%)10.86 | 6 | |
| Logical reasoning | ZebraLogic (test) | Error (%)17.05 | 6 | |
| Mathematical Reasoning | MATH-500 (test) | Error Rate36.52 | 6 | |
| Reasoning | MATH 500 | Error Rate3.08 | 6 | |
| Reasoning | ZebraLogic | Error Rate (%)4.76 | 6 |