CRISP: Compressed Reasoning via Iterative Self-Policy Distillation
About
Reasoning models think out loud, but much of what they say is noise. We introduce CRISP (Compressed Reasoning via Iterative Self-Policy Distillation), a method that teaches models to reason more concisely by distilling their own concise behavior back into themselves. The entire approach reduces to one idea: condition the same model on a ''be concise'' instruction to obtain teacher logits, and minimize per-token reverse KL on the student's own rollouts. No ground-truth answers, no token budgets, no difficulty estimators. Just self-distillation. Yet this simplicity belies surprising sophistication: CRISP automatically compresses easy problems aggressively while preserving the deliberation needed for hard ones. On Qwen3-8B and Qwen3-14B, we achieve 57--59% token reduction on MATH-500 while improving accuracy by 9--16 points absolute. On AIME 2024, the 14B model gains 10 points with 41% compression. Ablations show that qualitative conciseness instructions outperform explicit token targets, and periodic teacher refreshes yield a broad stable regime. The method generalizes across model families -- DeepSeek-R1-Distill-Llama-8B improves accuracy by up to 5 points with 17--32% compression -- and transfers beyond math to multi-step agentic planning (DeepPlanning), reducing token usage by 42--51% while preserving planning quality. Code is available at https://github.com/HJSang/OPSD_Reasoning_Compression.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Understanding | MMLU | Accuracy76.9 | 844 | |
| Mathematical Reasoning | MATH 500 | Top-1 Accuracy86.93 | 384 | |
| Mathematical Reasoning | Minerva | Pass@1 Accuracy34.47 | 289 | |
| Mathematical Reasoning | HMMT 2025 | -- | 194 | |
| Mathematical Reasoning | AIME 24 | Pass@1 Accuracy76.04 | 128 | |
| Mathematical Reasoning | HMMT25 | Accuracy (%)42.92 | 115 | |
| Mathematical Reasoning | GSM8K | Accuracy (avg@K)94.01 | 7 | |
| Mathematical Reasoning | AIME 2024 | Avg @K Score74.06 | 7 | |
| Mathematical Reasoning | MATH 500 | Accuracy (avg@K)86.94 | 7 | |
| Mathematical Reasoning | Minerva Math | Avg@K32.84 | 7 |