Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CRISP: Compressed Reasoning via Iterative Self-Policy Distillation

About

Reasoning models think out loud, but much of what they say is noise. We introduce CRISP (Compressed Reasoning via Iterative Self-Policy Distillation), a method that teaches models to reason more concisely by distilling their own concise behavior back into themselves. The entire approach reduces to one idea: condition the same model on a ''be concise'' instruction to obtain teacher logits, and minimize per-token reverse KL on the student's own rollouts. No ground-truth answers, no token budgets, no difficulty estimators. Just self-distillation. Yet this simplicity belies surprising sophistication: CRISP automatically compresses easy problems aggressively while preserving the deliberation needed for hard ones. On Qwen3-8B and Qwen3-14B, we achieve 57--59% token reduction on MATH-500 while improving accuracy by 9--16 points absolute. On AIME 2024, the 14B model gains 10 points with 41% compression. Ablations show that qualitative conciseness instructions outperform explicit token targets, and periodic teacher refreshes yield a broad stable regime. The method generalizes across model families -- DeepSeek-R1-Distill-Llama-8B improves accuracy by up to 5 points with 17--32% compression -- and transfers beyond math to multi-step agentic planning (DeepPlanning), reducing token usage by 42--51% while preserving planning quality. Code is available at https://github.com/HJSang/OPSD_Reasoning_Compression.

Hejian Sang, Yuanda Xu, Zhengze Zhou, Ran He, Zhipeng Wang, Jiachen Sun• 2026

Related benchmarks

TaskDatasetResultRank
Language UnderstandingMMLU
Accuracy76.9
844
Mathematical ReasoningMATH 500
Top-1 Accuracy86.93
384
Mathematical ReasoningMinerva
Pass@1 Accuracy34.47
289
Mathematical ReasoningHMMT 2025--
194
Mathematical ReasoningAIME 24
Pass@1 Accuracy76.04
128
Mathematical ReasoningHMMT25
Accuracy (%)42.92
115
Mathematical ReasoningGSM8K
Accuracy (avg@K)94.01
7
Mathematical ReasoningAIME 2024
Avg @K Score74.06
7
Mathematical ReasoningMATH 500
Accuracy (avg@K)86.94
7
Mathematical ReasoningMinerva Math
Avg@K32.84
7
Showing 10 of 25 rows

Other info

GitHub

Follow for update