Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis

About

Advancing complex reasoning in large language models relies on high-quality, verifiable datasets, yet human annotation remains cost-prohibitive and difficult to scale. Current synthesis paradigms often face a recurring trade-off: maintaining structural validity typically restricts problem complexity, while relaxing constraints to increase difficulty frequently leads to inconsistent or unsolvable instances. To address this, we propose Agentic Proposing, a framework that models problem synthesis as a goal-driven sequential decision process where a specialized agent dynamically selects and composes modular reasoning skills. Through an iterative workflow of internal reflection and tool-use, we develop the Agentic-Proposer-4B using Multi-Granularity Policy Optimization (MGPO) to generate high-precision, verifiable training trajectories across mathematics, coding, and science. Empirical results demonstrate that downstream solvers trained on agent-synthesized data significantly outperform leading baselines and exhibit robust cross-domain generalization. Notably, a 30B solver trained on only 11,000 synthesized trajectories achieves a state-of-the-art 91.6% accuracy on AIME25, rivaling frontier-scale proprietary models such as GPT-5 and proving that a small volume of high-quality synthetic signals can effectively substitute for massive human-curated datasets.

Zhengbo Jiao, Shaobo Wang, Zifan Zhang, Xuan Ren, Wei Wang, Bing Zhao, Hu Wei, Linfeng Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 2024
Accuracy93.5
251
Mathematical ReasoningAIME 2025
Accuracy91.6
227
Mathematical ReasoningAMO-Bench
Mean@64 Accuracy11.8
27
Scientific ReasoningSuperGPQA
Mean@150.1
24
Code GenerationLiveCodeBench v6
Accuracy71.2
23
Scientific ReasoningGPQA
Mean@168.3
22
Mathematical ReasoningAIME 2024
Mean@64 Accuracy53.6
19
Mathematical ReasoningAIME 2025
Mean@64 Acc51.2
19
Mathematical ReasoningHMMT February
Mean@64 Acc0.365
19
Mathematical ReasoningHMMT
HMMT Accuracy77.6
14
Showing 10 of 15 rows

Other info

Follow for update