Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Textworld

Benchmarks

Task NameDataset NameSOTA ResultTrend
Interactive Decision-makingTextWorld
Real100
24
Next-state predictionTextWorld (TW)
EM Accuracy70.6
16
Task successTextWorld
Real100
14
Agentic Task SuccessTextworld
Success Rate75
12
Household task planningTextWorld Cooking (test)
Metric-
0
Showing 5 of 5 rows