Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models

About

Recent Large Reasoning Models (LRMs) like DeepSeek-R1 have demonstrated remarkable success in complex reasoning tasks, exhibiting human-like patterns in exploring multiple alternative solutions. Upon closer inspection, however, we uncover a surprising phenomenon: The First is The Best, where alternative solutions are not merely suboptimal but potentially detrimental. This observation challenges widely accepted test-time scaling laws, leading us to hypothesize that errors within the reasoning path scale concurrently with test time. Through comprehensive empirical analysis, we characterize errors as a forest-structured Forest of Errors (FoE) and conclude that FoE makes the First the Best, which is underpinned by rigorous theoretical analysis. Leveraging these insights, we propose RED, a self-guided efficient reasoning framework comprising two components: I) Refining First, which suppresses FoE growth in the first solution; and II) Discarding Subs, which prunes subsequent FoE via dual-consistency. Extensive experiments across five benchmarks and six backbone models demonstrate that RED outperforms eight competitive baselines, achieving performance gains of up to 19.0% while reducing token consumption by 37.7% ~ 70.4%. Moreover, comparative experiments on FoE metrics shed light on how RED achieves effectiveness.

Kehan Jiang, Haonan Dong, Zhaolu Kang, Zhengzhou Zhu, Guojie Song• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH 500--
76
Mathematical ReasoningAIME 24
Pass@180
54
Mathematical ReasoningAIME 25
Pass@1 Accuracy72.2
54
Mathematical ReasoningGSM8K
Pass@1 Accuracy95
54
Scientific ReasoningGPQA Diamond
Pass@1 Accuracy66.8
54
Mathematical ReasoningAIME24
Total Inference Runtime (mm:ss)0.00e+0
36
Mathematical ReasoningGSM8K
Total Inference Time (mm:ss)0.00e+0
36
Mathematical ReasoningAIME 25
Total Inference Runtime (s)1
36
Scientific ReasoningGPQA Diamond
Total Inference Runtime (s)2
36
Mathematical ReasoningAIME25
Pass@172.2
15
Showing 10 of 13 rows

Other info

Follow for update