QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation

About

Reinforcement learning (RL) has emerged as a central paradigm for training large language models (LLMs) in reasoning tasks. Yet recent studies question RL's ability to incentivize reasoning capacity beyond the base model. This raises a key challenge: how can RL be adapted to solve harder reasoning problems more effectively? To address this challenge, we propose a simple yet effective strategy via Question Augmentation: introduce partial solutions during training to reduce problem difficulty and provide more informative learning signals. Our method, QuestA, when applied during RL training on math reasoning tasks, not only improves pass@1 but also pass@k-particularly on problems where standard RL struggles to make progress. This enables continual improvement over strong open-source models such as DeepScaleR and OpenMath Nemotron, further enhancing their reasoning capabilities. We achieve new state-of-the-art results on math benchmarks using 1.5B-parameter models: 72.50% (+10.73%) on AIME24, 62.29% (+12.79%) on AIME25, and 41.67% (+10.11%) on HMMT25. Code, data and model are available at https://github.com/foreverlasting1202/QuestA.

Jiazheng Li, Hongzhou Lin, Hong Lu, Kaiyue Wen, Zaiwen Yang, Jiaxuan Gao, Yi Wu, Jingzhao Zhang• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH 500	Accuracy (Acc)94.05	543
Mathematical Reasoning	AIME 2024	Accuracy18.5	479
Mathematical Reasoning	AMC	Accuracy (%)60.2	368
Mathematical Reasoning	AIME 24	Accuracy74.26	318
Mathematical Reasoning	Minerva	Pass@1 Accuracy34	289
Mathematical Reasoning	AIME 2025	Accuracy62.08	227
Mathematical Reasoning	OlympiadBench	Accuracy78.53	213
Mathematical Reasoning	AIME 25	Pass@1 Accuracy64.99	178
Mathematical Reasoning	AMC 2023	Accuracy93.44	144
Mathematical Reasoning	HMMT25	Accuracy (%)44.35	115

Showing 10 of 25 rows

Other info

Follow for update

@wizwand_team Discord