Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AppWorld

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agentic task solvingAppWorld
TGC90
28
Multi-turn tool-useAppWorld
Avg@463.6
25
Agentic Task CompletionAppWorld LeaderBoard
Greedy Success Rate48.8
13
Tool ShortlistingAppWorld v1.0 (test)
R-precision (AZ)0.71
9
Interactive environment task executionAppWorld normal (test)
Avg@8 Success4,554
9
Multimodal app-use reasoningAppWorld
Cost0.05
7
Showing 6 of 6 rows