Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems

About

Recently, slow-thinking reasoning systems, such as o1, have demonstrated remarkable capabilities in solving complex reasoning tasks. These systems typically engage in an extended thinking process before responding to a query, allowing them to generate more thorough, accurate, and well-reasoned solutions. These systems are primarily developed and maintained by industry, with their core techniques not publicly disclosed. In response, an increasing number of studies from the research community aim to explore the technical foundations underlying these powerful reasoning systems. Building on these prior efforts, this paper presents a reproduction report on implementing o1-like reasoning systems. We introduce an ``imitate, explore, and self-improve'' framework, denoted as \textbf{STILL-2}, as our primary technical approach to train the reasoning model. In the initial phase, we use distilled long-form thought data to fine-tune the reasoning model, enabling it to invoke a slow-thinking mode. The model is then encouraged to explore challenging problems by generating multiple rollouts, which can result in increasingly more high-quality trajectories that lead to correct answers. Furthermore, the model undergoes self-improvement by iteratively refining its training dataset. To verify the effectiveness of this approach, we conduct extensive experiments on three challenging benchmarks. The experimental results demonstrate that our approach achieves competitive performance compared to industry-level reasoning systems on these benchmarks.

Yingqian Min, Zhipeng Chen, Jinhao Jiang, Jie Chen, Jia Deng, Yiwen Hu, Yiru Tang, Jiapeng Wang, Xiaoxue Cheng, Huatong Song, Wayne Xin Zhao, Zheng Liu, Zhongyuan Wang, Ji-Rong Wen• 2024

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 2024
Accuracy20
479
Mathematical ReasoningAIME 2025
Accuracy16.7
311
Mathematical ReasoningMinerva
Pass@1 Accuracy14
289
Mathematical ReasoningOlympiadBench
Accuracy29.4
213
Mathematical ReasoningAMC23
PASS@1 Accuracy23.8
207
Mathematical ReasoningMATH 500
Accuracy72.4
116
Mathematical ReasoningLiveMathBench
Accuracy3
19
Mathematical ReasoningKSAT 2025
Accuracy53.3
15
Mathematical ReasoningMinerva
Accuracy23.2
15
Mathematical ReasoningMATH-OAI
Accuracy90.2
9
Showing 10 of 10 rows

Other info

Follow for update