Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning

About

Long chain-of-thought (Long-CoT) reasoning improves accuracy in LLMs, yet its verbose, self-reflective style often hinders effective distillation into small language models (SLMs). We revisit Long-CoT compression through the lens of capability alignment and ask: Can pruning improve reasoning? We propose Prune-on-Logic, a structure-aware framework that transforms Long-CoT into logic graphs and selectively prunes low-utility reasoning steps under self-verification constraints. Through systematic analysis across three pruning strategies targeting entire chains, core reasoning, and verification, we find that verification pruning consistently improves accuracy while reducing token usage, whereas pruning reasoning steps or indiscriminate pruning degrades performance. Our study reveals that effective pruning aligns supervision with model capacity rather than merely shortening inputs. Gains hold across tasks, model scales, and CoT capability, with larger models benefiting more from pruning due to richer but more redundant reasoning. Our empirical findings highlight pruning as a structural optimization strategy for aligning CoT reasoning with SLM capacity.

Shangziqi Zhao, Jiahao Yuan, Jinyang Wu, Zhenglin Wang, Guisong Yang, Usman Naseem• 2025

Related benchmarks

TaskDatasetResultRank
Multimodal ReasoningWeMath
Accuracy63.4
43
Multimodal ReasoningMMStar
Accuracy57.7
29
Multimodal ReasoningMathVista
Accuracy45.9
29
Multimodal ReasoningR1-Onevision-Bench (Overall)
Accuracy34.1
23
Multimodal ReasoningMMMU
Accuracy55.7
8
Multimodal ReasoningR1-Onevision-Bench Math
Accuracy25.4
8
Multimodal ReasoningR1-Onevision-Bench Physics
Accuracy34.4
8
Multimodal ReasoningR1-Onevision-Bench Deduction
Accuracy27.3
8
Visual Information Preservation and Explainability EvaluationMultimodal Reasoning Benchmarks (MathVista, WeMath, MMStar, MMMU, R1-Onevision-Bench) (test)
Visual Info Preservation Score3.51
4
Showing 9 of 9 rows

Other info

Follow for update