Conditional Performance Guarantee for Large Reasoning Models

About

Large reasoning models have shown strong performance through extended chain-of-thought reasoning, yet their computational cost remains significant. Probably approximately correct (PAC) reasoning provides statistical guarantees for efficient reasoning by adaptively switching between thinking and non-thinking models, but the guarantee holds only in the marginal case and does not provide exact conditional coverage. We propose G-PAC reasoning, a practical framework that provides PAC-style guarantees at the group level by partitioning the input space. We develop two instantiations: Group PAC (G-PAC) reasoning for known group structures and Clustered PAC (C-PAC) reasoning for unknown groupings. We prove that both G-PAC and C-PAC achieve group-conditional risk control, and that grouping can strictly improve efficiency over marginal PAC reasoning in heterogeneous settings. Our experiments on diverse reasoning benchmarks demonstrate that G-PAC and C-PAC successfully achieve group-conditional risk control while maintaining substantial computational savings.

Jianguo Huang, Hao Zeng, Bingyi Jing, Hongxin Wei, Bo An• 2026

Related benchmarks

Task	Dataset	Result
Logical reasoning	ZebraLogic (test)	--	90
Mathematical Reasoning	MATH-500 (test)	--	46
Open-domain task	Arena Hard	Error (%)5.17	12
Open-domain task	Arena-Hard (test)	Error12.61	12
Reasoning	MATH 500	Error Rate2.57	6
Reasoning	ZebraLogic	Error Rate (%)4.01	6
Reasoning	GPQA	Error Rate (%)10.82	6
Scientific Reasoning	GPQA (test)	Error Rate (%)11.9	6

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord