ASTER: Agentic Scaling with Tool-integrated Extended Reasoning

About

Reinforcement learning (RL) has emerged as a dominant paradigm for eliciting long-horizon reasoning in Large Language Models (LLMs). However, scaling Tool-Integrated Reasoning (TIR) via RL remains challenging due to interaction collapse: a pathological state where models fail to sustain multi-turn tool usage, instead degenerating into heavy internal reasoning with only trivial, post-hoc code verification. We systematically study three questions: (i) how cold-start SFT induces an agentic, tool-using behavioral prior, (ii) how the interaction density of cold-start trajectories shapes exploration and downstream RL outcomes, and (iii) how the RL interaction budget affects learning dynamics and generalization under varying inference-time budgets. We then introduce ASTER (Agentic Scaling with Tool-integrated Extended Reasoning), a framework that circumvents this collapse through a targeted cold-start strategy prioritizing interaction-dense trajectories. We find that a small expert cold-start set of just 4K interaction-dense trajectories yields the strongest downstream performance, establishing a robust prior that enables superior exploration during extended RL training. Extensive evaluations demonstrate that ASTER-4B achieves state-of-the-art results on competitive mathematical benchmarks, reaching 90.0% on AIME 2025, surpassing leading frontier open-source models, including DeepSeek-V3.2-Exp.

Xuqin Zhang, Quan He, Zhenrui Zheng, Zongzhang Zhang, Xu He, Dong Li• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	AIME 2025	Accuracy90	378
Mathematical Reasoning	HMMT 2025	Accuracy77.1	241
Mathematical Reasoning	HMMT Feb 2025	--	54
Mathematical Reasoning	AIME 2026	Accuracy (avg@16)78.8	30
Mathematical Reasoning	BeyondAIME	avg@1661.7	23
Mathematical Reasoning	BeyondAIME	Accuracy61.7	18
Mathematical Reasoning	Math Reasoning Suite Arithmetic Mean	Average Score (@16)73.8	15
Mathematical Reasoning	AIME 2024	AIME 2024 Avg Score85.8	14
Mathematical Reasoning	AIME 2025	Avg Score @1690	14
Science Question Answering	GPQA Diamond	Average@1663.42	3

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord