Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models

About

The ability of large language models to solve complex mathematical problems has progressed significantly, particularly for tasks requiring advanced reasoning. However, the scarcity of sufficiently challenging problems, particularly at the Olympiad level, hinders further advancements. In this work, we introduce PromptCoT, a novel approach for automatically generating high-quality Olympiad-level math problems. The proposed method synthesizes complex problems based on mathematical concepts and the rationale behind problem construction, emulating the thought processes of experienced problem designers. We provide a theoretical analysis demonstrating that an optimal rationale should maximize both the likelihood of rationale generation given the associated concepts and the likelihood of problem generation conditioned on both the rationale and the concepts. Our method is evaluated on standard benchmarks including GSM8K, MATH-500, and AIME2024, where it consistently outperforms existing problem generation methods. Furthermore, we demonstrate that PromptCoT exhibits superior data scalability, consistently maintaining high performance as the dataset size increases, outperforming the baselines. The implementation is available at https://github.com/zhaoxlpku/PromptCoT.

Xueliang Zhao, Wei Wu, Jian Guan, Lingpeng Kong• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K (test)
Accuracy93.3
751
Mathematical ReasoningMATH500 (test)
Accuracy93
381
Mathematical ReasoningAIME 2024 (test)
Accuracy60
103
Mathematical ReasoningGSM8K v1 (test)
Accuracy87.1
35
Mathematical ReasoningAIME 2024--
11
Mathematical ReasoningMATH-500 and AIME2024
Micro Avg. Accuracy80.8
5
Mathematical ReasoningPROMPTCOT--
1
Mathematical ReasoningOpenMathInstruct--
1
Mathematical ReasoningNuminaMath--
1
Mathematical ReasoningEvol-Instruct--
1
Showing 10 of 11 rows

Other info

Code

Follow for update