Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ToRL: Scaling Tool-Integrated RL

About

We introduce ToRL (Tool-Integrated Reinforcement Learning), a framework for training large language models (LLMs) to autonomously use computational tools via reinforcement learning. Unlike supervised fine-tuning, ToRL allows models to explore and discover optimal strategies for tool use. Experiments with Qwen2.5-Math models show significant improvements: ToRL-7B reaches 43.3\% accuracy on AIME~24, surpassing reinforcement learning without tool integration by 14\% and the best existing Tool-Integrated Reasoning (TIR) model by 17\%. Further analysis reveals emergent behaviors such as strategic tool invocation, self-regulation of ineffective code, and dynamic adaptation between computational and analytical reasoning, all arising purely through reward-driven learning.

Xuefeng Li, Haoyang Zou, Pengfei Liu• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH
Accuracy87.8
535
Mathematical ReasoningAIME 25
Accuracy27.9
201
Mathematical ReasoningAMC 23
Accuracy45
198
Mathematical ReasoningAIME24
Accuracy74
130
Mathematical ReasoningGSM8K--
102
Mathematical ReasoningAIME 24
AIME 24 Accuracy23.33
84
Scientific Question AnsweringGPQA Diamond
Accuracy51.5
64
Expert-Level Question AnsweringGPQA Diamond
Pass@165.15
39
Knowledge-intensive reasoningMuSiQue
Accuracy72
31
Function CallingBFCL (Berkeley Function Calling Leaderboard)
Base Score0.00e+0
28
Showing 10 of 27 rows

Other info

Follow for update