Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression

About

Chain-of-Thought (CoT) reasoning successfully enhances the reasoning capabilities of Large Language Models (LLMs), yet it incurs substantial computational overhead for inference. Existing CoT compression methods often suffer from a critical loss of logical fidelity at high compression ratios, resulting in significant performance degradation. To achieve high-fidelity, fast reasoning, we propose a novel EXTreme-RAtio Chain-of-Thought Compression framework, termed Extra-CoT, which aggressively reduces the token budget while preserving answer accuracy. To generate reliable, high-fidelity supervision, we first train a dedicated semantically-preserved compressor on mathematical CoT data with fine-grained annotations. An LLM is then fine-tuned on these compressed pairs via a mixed-ratio supervised fine-tuning (SFT), teaching it to follow a spectrum of compression budgets and providing a stable initialization for reinforcement learning (RL). We further propose Constrained and Hierarchical Ratio Policy Optimization (CHRPO) to explicitly incentivize question-solving ability under lower budgets by a hierarchical reward. Experiments on three mathematical reasoning benchmarks show the superiority of Extra-CoT. For example, on MATH-500 using Qwen3-1.7B, Extra-CoT achieves over 73\% token reduction with an accuracy improvement of 0.6\%, significantly outperforming state-of-the-art (SOTA) methods. Our source codes have been released at https://github.com/Mwie1024/Extra-CoT.

Yuntian Tang, Bohan Jia, Wenxuan Huang, Lianyue Zhang, Jiao Xie, Wenxi Li, Wei Li, Jie Hu, Xinghao Chen Rongrong Ji, Shaohui Lin• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	SVAMP	Accuracy93	403
Mathematical Reasoning	MultiArith	Accuracy99.4	143
General Knowledge (STEM)	MMLU STEM	Accuracy71.3	30
Mathematical Reasoning	GSM8K	Tokens210	30
Mathematical Reasoning	MATH 500	Tokens Used452	17
Mathematical Reasoning	AMC 2023	Tokens675	17
Mathematical Reasoning	MetaMath 1k	Token Count212	14
Chain-of-Thought Compression	MATH 500	Token Reduction (%)73	6

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord