Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ToRL: Scaling Tool-Integrated RL

About

We introduce ToRL (Tool-Integrated Reinforcement Learning), a framework for training large language models (LLMs) to autonomously use computational tools via reinforcement learning. Unlike supervised fine-tuning, ToRL allows models to explore and discover optimal strategies for tool use. Experiments with Qwen2.5-Math models show significant improvements: ToRL-7B reaches 43.3\% accuracy on AIME~24, surpassing reinforcement learning without tool integration by 14\% and the best existing Tool-Integrated Reasoning (TIR) model by 17\%. Further analysis reveals emergent behaviors such as strategic tool invocation, self-regulation of ineffective code, and dynamic adaptation between computational and analytical reasoning, all arising purely through reward-driven learning.

Xuefeng Li, Haoyang Zou, Pengfei Liu• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH
Accuracy87.8
535
Mathematical ReasoningAIME 25
Accuracy27.9
201
Mathematical ReasoningAMC 23
Accuracy45
198
Mathematical ReasoningAIME24
Accuracy74
160
Mathematical ReasoningGSM8K--
102
Mathematical ReasoningAIME 24
AIME 24 Accuracy23.33
84
Scientific Question AnsweringGPQA Diamond
Accuracy51.5
64
Expert-Level Question AnsweringGPQA Diamond
Pass@165.15
39
Geometry Problem SolvingGeometry3K
Accuracy53.5
36
Knowledge-intensive reasoningMuSiQue
Accuracy72
31
Showing 10 of 33 rows

Other info

Follow for update