Share your thoughts, 1 month free Claude Pro on usSee more

Agentic Tool-use on tau^2 Bench (GPT-4.1 Simulator Setting)

0.775Retail Score

REACT(GPT-5)

Updated 4mo ago

Evaluation Results

Method	Links
REACT(GPT-5) 2026.02		0.775	0.975	0.517	0.803
REACT(GPT-4.1) 2026.02		0.667	0.5	0.417	0.55
SR 2026.02		0.525	0.458	0.433	0.48
Imitation Learning 2026.02		0.492	0.5	0.333	0.463
RWML + Policy RL 2026.02		0.483	0.417	0.5	0.46
RWML 2026.02		0.433	0.475	0.45	0.453
REACT(Qwen3-8B) 2026.02		0.425	0.3	0.233	0.337
WM SFT 2026.02		0.408	0.308	0.3	0.347
Policy RL 2026.02		0.342	0.45	0.317	0.38