Conditional Performance Guarantee for Large Reasoning Models
About
Large reasoning models have shown strong performance through extended chain-of-thought reasoning, yet their computational cost remains significant. Probably approximately correct (PAC) reasoning provides statistical guarantees for efficient reasoning by adaptively switching between thinking and non-thinking models, but the guarantee holds only in the marginal case and does not provide exact conditional coverage. We propose G-PAC reasoning, a practical framework that provides PAC-style guarantees at the group level by partitioning the input space. We develop two instantiations: Group PAC (G-PAC) reasoning for known group structures and Clustered PAC (C-PAC) reasoning for unknown groupings. We prove that both G-PAC and C-PAC achieve group-conditional risk control, and that grouping can strictly improve efficiency over marginal PAC reasoning in heterogeneous settings. Our experiments on diverse reasoning benchmarks demonstrate that G-PAC and C-PAC successfully achieve group-conditional risk control while maintaining substantial computational savings.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Open-domain task | Arena Hard | Error (%)5.17 | 12 | |
| Open-domain task | Arena-Hard (test) | Error12.61 | 12 | |
| Logical reasoning | ZebraLogic (test) | Error (%)11.13 | 6 | |
| Mathematical Reasoning | MATH-500 (test) | Error Rate32.16 | 6 | |
| Reasoning | MATH 500 | Error Rate2.57 | 6 | |
| Reasoning | ZebraLogic | Error Rate (%)4.01 | 6 | |
| Reasoning | GPQA | Error Rate (%)10.82 | 6 | |
| Scientific Reasoning | GPQA (test) | Error Rate (%)11.9 | 6 |