Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems

About

Recently, slow-thinking reasoning systems, such as o1, have demonstrated remarkable capabilities in solving complex reasoning tasks. These systems typically engage in an extended thinking process before responding to a query, allowing them to generate more thorough, accurate, and well-reasoned solutions. These systems are primarily developed and maintained by industry, with their core techniques not publicly disclosed. In response, an increasing number of studies from the research community aim to explore the technical foundations underlying these powerful reasoning systems. Building on these prior efforts, this paper presents a reproduction report on implementing o1-like reasoning systems. We introduce an ``imitate, explore, and self-improve'' framework, denoted as \textbf{STILL-2}, as our primary technical approach to train the reasoning model. In the initial phase, we use distilled long-form thought data to fine-tune the reasoning model, enabling it to invoke a slow-thinking mode. The model is then encouraged to explore challenging problems by generating multiple rollouts, which can result in increasingly more high-quality trajectories that lead to correct answers. Furthermore, the model undergoes self-improvement by iteratively refining its training dataset. To verify the effectiveness of this approach, we conduct extensive experiments on three challenging benchmarks. The experimental results demonstrate that our approach achieves competitive performance compared to industry-level reasoning systems on these benchmarks.

Yingqian Min, Zhipeng Chen, Jinhao Jiang, Jie Chen, Jia Deng, Yiwen Hu, Yiru Tang, Jiapeng Wang, Xiaoxue Cheng, Huatong Song, Wayne Xin Zhao, Zheng Liu, Zhongyuan Wang, Ji-Rong Wen• 2024

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	AIME 2024	Accuracy20	525
Mathematical Reasoning	AIME 2025	Accuracy16.7	353
Mathematical Reasoning	Minerva	Pass@1 Accuracy14	289
Mathematical Reasoning	AMC23	PASS@1 Accuracy23.8	216
Mathematical Reasoning	OlympiadBench	Accuracy29.4	213
Mathematical Reasoning	MATH 500	Accuracy72.4	183
Mathematical Reasoning	LiveMathBench	Accuracy3	19
Mathematical Reasoning	Minerva	Accuracy23.2	18
Mathematical Reasoning	KSAT 2025	Accuracy53.3	15
Mathematical Reasoning	MATH-OAI	Accuracy90.2	9

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord