Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency
About
Recent advances in large reasoning models have enabled complex, step-by-step reasoning but often introduce significant overthinking, resulting in verbose and redundant outputs that hinder efficiency. In this study, we examine whether explicit self-reflection, signaled by tokens such as "Wait" and "Hmm", is necessary for advanced reasoning. We propose NoWait, a simple yet effective approach that disables explicit self-reflection by suppressing these tokens during inference. Extensive experiments on ten benchmarks across textual, visual, and video reasoning tasks show that NoWait reduces chain-of-thought trajectory length by up to 27%-51% in five R1-style model series, without compromising model utility. NoWait thus offers a plug-and-play solution for efficient and utility-preserving multimodal reasoning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Math Reasoning | AMC23 | Pass@1 Accuracy95 | 99 | |
| Science Reasoning | ARC-C | Accuracy96.2 | 58 | |
| Mathematical Reasoning | MATH500 | Accuracy88.7 | 57 | |
| Math Reasoning | GSM8K | Pass@1 Accuracy96.3 | 57 | |
| Mathematical Reasoning | AIME 2024 | Accuracy70 | 54 | |
| Science Reasoning | GPQA D | Accuracy68.2 | 52 | |
| Math Reasoning | GSM8K | Accuracy95.8 | 49 | |
| General Reasoning | Overall | Accuracy83.3 | 40 | |
| Math Reasoning | AIME 2025 | Accuracy66.7 | 36 | |
| Math and Science Reasoning | Average | Accuracy80.8 | 36 |