Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis
About
Advancing complex reasoning in large language models relies on high-quality, verifiable datasets, yet human annotation remains cost-prohibitive and difficult to scale. Current synthesis paradigms often face a recurring trade-off: maintaining structural validity typically restricts problem complexity, while relaxing constraints to increase difficulty frequently leads to inconsistent or unsolvable instances. To address this, we propose Agentic Proposing, a framework that models problem synthesis as a goal-driven sequential decision process where a specialized agent dynamically selects and composes modular reasoning skills. Through an iterative workflow of internal reflection and tool-use, we develop the Agentic-Proposer-4B using Multi-Granularity Policy Optimization (MGPO) to generate high-precision, verifiable training trajectories across mathematics, coding, and science. Empirical results demonstrate that downstream solvers trained on agent-synthesized data significantly outperform leading baselines and exhibit robust cross-domain generalization. Notably, a 30B solver trained on only 11,000 synthesized trajectories achieves a state-of-the-art 91.6% accuracy on AIME25, rivaling frontier-scale proprietary models such as GPT-5 and proving that a small volume of high-quality synthetic signals can effectively substitute for massive human-curated datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | AIME 2024 | Accuracy93.5 | 251 | |
| Mathematical Reasoning | AIME 2025 | Accuracy91.6 | 227 | |
| Mathematical Reasoning | AMO-Bench | Mean@64 Accuracy11.8 | 27 | |
| Scientific Reasoning | SuperGPQA | Mean@150.1 | 24 | |
| Code Generation | LiveCodeBench v6 | Accuracy71.2 | 23 | |
| Scientific Reasoning | GPQA | Mean@168.3 | 22 | |
| Mathematical Reasoning | AIME 2024 | Mean@64 Accuracy53.6 | 19 | |
| Mathematical Reasoning | AIME 2025 | Mean@64 Acc51.2 | 19 | |
| Mathematical Reasoning | HMMT February | Mean@64 Acc0.365 | 19 | |
| Mathematical Reasoning | HMMT | HMMT Accuracy77.6 | 14 |