Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Conditional Performance Guarantee for Large Reasoning Models

About

Large reasoning models have shown strong performance through extended chain-of-thought reasoning, yet their computational cost remains significant. Probably approximately correct (PAC) reasoning provides statistical guarantees for efficient reasoning by adaptively switching between thinking and non-thinking models, but the guarantee holds only in the marginal case and does not provide exact conditional coverage. We propose G-PAC reasoning, a practical framework that provides PAC-style guarantees at the group level by partitioning the input space. We develop two instantiations: Group PAC (G-PAC) reasoning for known group structures and Clustered PAC (C-PAC) reasoning for unknown groupings. We prove that both G-PAC and C-PAC achieve group-conditional risk control, and that grouping can strictly improve efficiency over marginal PAC reasoning in heterogeneous settings. Our experiments on diverse reasoning benchmarks demonstrate that G-PAC and C-PAC successfully achieve group-conditional risk control while maintaining substantial computational savings.

Jianguo Huang, Hao Zeng, Bingyi Jing, Hongxin Wei, Bo An• 2026

Related benchmarks

TaskDatasetResultRank
Open-domain taskArena Hard
Error (%)5.17
12
Open-domain taskArena-Hard (test)
Error12.61
12
Logical reasoningZebraLogic (test)
Error (%)11.13
6
Mathematical ReasoningMATH-500 (test)
Error Rate32.16
6
ReasoningMATH 500
Error Rate2.57
6
ReasoningZebraLogic
Error Rate (%)4.01
6
ReasoningGPQA
Error Rate (%)10.82
6
Scientific ReasoningGPQA (test)
Error Rate (%)11.9
6
Showing 8 of 8 rows

Other info

Follow for update