Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Simulating Environments with Reasoning Models for Agent Training

About

LLM agents excel in compact environments requiring deep reasoning but remain brittle when operating in broader, more complex contexts that demand robustness across diverse tools and schemas. Building bespoke environments for training is heavy, brittle, and limits progress. In this paper, we demonstrate that LLMs can simulate realistic environment feedback without access to actual testbed data or APIs. Inspired by this capability, we propose two frameworks: Simia-SFT, a pipeline that synthesizes SFT data by amplifying small seed sets into diverse trajectories in an environment-agnostic manner, and Simia-RL, a framework that enables RL training without real environment implementations through LLM-simulated feedback. Fine-tuning open models yields consistent improvements across multiple benchmarks, surpassing GPT-4o and approaching o4-mini on $\tau^2$-Bench. Together, Simia-SFT and Simia-RL enable scalable agent training without environment engineering, replacing heavy and brittle implementations with flexible LLM-based simulation.

Yuetai Li, Huseyin A Inan, Xiang Yue, Wei-Ning Chen, Lukas Wutschitz, Janardhan Kulkarni, Radha Poovendran, Robert Sim, Saravan Rajmohan• 2025

Related benchmarks

TaskDatasetResultRank
Function CallingBFCL V3
Overall Accuracy67.68
104
Interactive Tool-Use Agent Performancetau2-Bench
Retail Performance Score52.9
102
Agentic Workflow Successτ2-bench
Airline Success Rate34
43
Tool UseBFCL Multi-turn
Accuracy23.22
24
Agentic Tool-usetau2-bench Airline
Pass@148.5
22
Agentic Tool-usetau2-bench Retail
Pass@162.5
22
Tool-Use Agent EvaluationBFCL Multiturn (OOD) v3 (test)
Base Rate4.5
18
Tool UseTau-Bench
TAU-AIR Score52
14
Agentic Task SuccessMCP-Universe
Location Success Score5.71
11
Coding AgentRebenchT
OH-p@121.39
5
Showing 10 of 13 rows

Other info

Follow for update