Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency
About
Recent advances in large reasoning models have enabled complex, step-by-step reasoning but often introduce significant overthinking, resulting in verbose and redundant outputs that hinder efficiency. In this study, we examine whether explicit self-reflection, signaled by tokens such as "Wait" and "Hmm", is necessary for advanced reasoning. We propose NoWait, a simple yet effective approach that disables explicit self-reflection by suppressing these tokens during inference. Extensive experiments on ten benchmarks across textual, visual, and video reasoning tasks show that NoWait reduces chain-of-thought trajectory length by up to 27%-51% in five R1-style model series, without compromising model utility. NoWait thus offers a plug-and-play solution for efficient and utility-preserving multimodal reasoning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Math Reasoning | AMC23 | Pass@1 Accuracy95 | 68 | |
| Mathematical Reasoning | MATH500 | Accuracy88.7 | 57 | |
| General Reasoning | Overall | Accuracy83.3 | 40 | |
| Math Reasoning | GSM8K | Pass@1 Accuracy96.3 | 36 | |
| Math Reasoning | AIME 24 | Pass@1 Score66.7 | 36 | |
| Math Reasoning | MATH 500 | Pass@193.8 | 36 | |
| Math Reasoning | AIME 25 | Pass@163.3 | 33 | |
| Math Reasoning | Olympiad | Pass@162.6 | 30 | |
| Question Answering | GPQA | Accuracy38.9 | 22 | |
| Mathematical Reasoning | AMC | Accuracy97.5 | 15 |