Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Agentic Tool-use on tau^2 Bench (GPT-4.1 Simulator Setting)

0.775Retail Score

REACT(GPT-5)

0.324680.441590.55850.67541Feb 5, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
0.7750.9750.5170.803
2026.02
0.6670.50.4170.55
2026.02
0.5250.4580.4330.48
2026.02
0.4920.50.3330.463
2026.02
0.4830.4170.50.46
2026.02
0.4330.4750.450.453
2026.02
0.4250.30.2330.337
2026.02
0.4080.3080.30.347
2026.02
0.3420.450.3170.38