ToRL: Scaling Tool-Integrated RL

About

We introduce ToRL (Tool-Integrated Reinforcement Learning), a framework for training large language models (LLMs) to autonomously use computational tools via reinforcement learning. Unlike supervised fine-tuning, ToRL allows models to explore and discover optimal strategies for tool use. Experiments with Qwen2.5-Math models show significant improvements: ToRL-7B reaches 43.3\% accuracy on AIME~24, surpassing reinforcement learning without tool integration by 14\% and the best existing Tool-Integrated Reasoning (TIR) model by 17\%. Further analysis reveals emergent behaviors such as strategic tool invocation, self-regulation of ineffective code, and dynamic adaptation between computational and analytical reasoning, all arising purely through reward-driven learning.

Xuefeng Li, Haoyang Zou, Pengfei Liu• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH	Accuracy87.8	535
Mathematical Reasoning	MATH 500	--	236
Mathematical Reasoning	AIME 25	Accuracy27.9	201
Mathematical Reasoning	AMC 23	Accuracy45	198
Mathematical Reasoning	AIME24	Accuracy74	160
Scientific Question Answering	GPQA Diamond	Accuracy51.5	123
Code Generation	EvalPlus	--	115
Mathematical Reasoning	GSM8K	--	102
Mathematical Reasoning	AIME 24	AIME 24 Accuracy23.33	84
Mathematical Reasoning	AIME 2024	Mean Score (k=8)30	81

Showing 10 of 48 rows

Other info

Follow for update

@wizwand_team Discord