Think When You Need: Self-Adaptive Chain-of-Thought Learning

About

Chain of Thought (CoT) reasoning enhances language models' performance but often leads to inefficient "overthinking" on simple problems. We identify that existing approaches directly penalizing reasoning length fail to account for varying problem complexity. Our approach constructs rewards through length and quality comparisons, guided by theoretical assumptions that jointly enhance solution correctness with conciseness. Moreover, we further demonstrate our method to fuzzy tasks where ground truth is unavailable. Experiments across multiple reasoning benchmarks demonstrate that our method maintains accuracy while generating significantly more concise explanations, effectively teaching models to "think when needed."

Junjie Yang, Ke Lin, Xing Yu• 2025

Related benchmarks

Task	Dataset	Result
Long-context Reasoning	LongBench v2	--	113
Mathematical Reasoning	AIME 25	AUCOAA79.6	11
Code Generation	LiveCodeBench	AUCOAA93.9	11
Mathematical Reasoning	MATH 500	AUCOAA89.6	11
Mathematical Reasoning	AIME 24	AUCOAA74.5	11
Commonsense Reasoning	Common sense QA	AUCOAA74.8	11
Science Reasoning	GPQA Diamond	AUCOAA67.1	11

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord