CRISP: Compressed Reasoning via Iterative Self-Policy Distillation

About

Reasoning models often generate far more tokens than a task requires, which raises inference cost and can compound errors. We introduce CRISP (Compressed Reasoning via Iterative Self-Policy Distillation), an on-policy self-distillation method that teaches a model to reason more concisely by distilling its own concise behavior back into itself. The method uses a single idea: condition the same model on a "be concise" instruction to obtain teacher logits, then minimize the per-token reverse KL divergence between the student and this teacher on the student's own rollouts. It requires no ground-truth answers, no token budgets, and no difficulty estimators. The reverse-KL objective is naturally difficulty-adaptive: it compresses easy problems aggressively while preserving the reasoning steps that hard problems require. On Qwen3-14B, CRISP cuts reasoning length by up to 56% on MATH-500 and 38% on the harder AIME 2024, while improving MATH-500 accuracy by up to 3.3 points over the base model and holding AIME 2024 accuracy within about one point. This behavior generalizes across model sizes and families: Qwen3-8B shows the same compression with accuracy preserved, and DeepSeek-R1-Distill-Llama-8B improves accuracy on all five benchmarks while shortening its responses. General capabilities are preserved across all three models. Code is available at https://github.com/HJSang/OPSD_Reasoning_Compression.

Hejian Sang, Yuanda Xu, Zhengze Zhou, Ran He, Zhipeng Wang, Jiachen Sun• 2026

Related benchmarks

Task	Dataset	Result
Language Understanding	MMLU	Accuracy76.9	844
Mathematical Reasoning	MATH 500	Top-1 Accuracy86.93	452
Mathematical Reasoning	Minerva	Pass@1 Accuracy34.47	289
Mathematical Reasoning	HMMT 2025	--	241
Mathematical Reasoning	AIME 24	Pass@1 Accuracy76.04	153
Mathematical Reasoning	HMMT25	Accuracy (%)42.92	115
Mathematical Reasoning	GSM8K	Accuracy (avg@K)94.01	7
Mathematical Reasoning	AIME 2024	Avg @K Score74.06	7
Mathematical Reasoning	MATH 500	Accuracy (avg@K)86.94	7
Mathematical Reasoning	Minerva Math	Avg@K32.84	7

Showing 10 of 25 rows

Other info

GitHub

Follow for update

@wizwand_team Discord