Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

About

Large language models (LLMs) such as OpenAI's o1 have demonstrated remarkable abilities in complex reasoning tasks by scaling test-time compute and exhibiting human-like deep thinking. However, we identify a phenomenon we term underthinking, where o1-like LLMs frequently switch between different reasoning thoughts without sufficiently exploring promising paths to reach a correct solution. This behavior leads to inadequate depth of reasoning and decreased performance, particularly on challenging mathematical problems. To systematically analyze this issue, we conduct experiments on three challenging test sets and two representative open-source o1-like models, revealing that frequent thought switching correlates with incorrect responses. We introduce a novel metric to quantify underthinking by measuring token efficiency in incorrect answers. To address underthinking, we propose a decoding strategy with thought switching penalty TIP that discourages premature transitions between thoughts, encouraging deeper exploration of each reasoning path. Experimental results demonstrate that our approach improves accuracy across challenging datasets without requiring model fine-tuning. Our findings contribute to understanding reasoning inefficiencies in o1-like LLMs and offer a practical solution to enhance their problem-solving capabilities.

Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Linfeng Song, Dian Yu, Juntao Li, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	AIME 24	Accuracy76.7	358
Mathematical Reasoning	AMC23	PASS@1 Accuracy100	216
Mathematical Reasoning	GSM8K	Accuracy96.4	166
Mathematical Reasoning	AMC 2023	Accuracy85	144
Mathematical Reasoning	MATH500	Accuracy93.8	104
Code Generation	LiveCodeBench	Accuracy63	64
Mathematical Reasoning	AIME 2025	Accuracy33	59
Mathematical Reasoning	MATH500	Accuracy90.5	57
Mathematical Reasoning	MATH 500	Mean@10.88	55
Mathematical Reasoning	MATH 500	Pass@1 Rate92.4	26

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord