Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Textworld

Benchmarks

Task NameDataset NameSOTA ResultTrend
Interactive Decision-makingTextWorld
Real100
24
Text-based Task CompletionTextWorld
Mean Normalised Score74.28
18
Next-state predictionTextWorld (TW)
EM Accuracy70.6
16
Text-based agent interactionTextWorld Cooking (test)
Accuracy75.5
14
Text-based agent interactionTextWorld Treasure (test)
Accuracy81.5
14
Text-based agent interactionTextWorld Quest (test)
Accuracy88
14
Task successTextWorld
Real100
14
Agentic Task SuccessTextworld
Success Rate75
12
Interactive FictionTextWorld
Success Rate (%)98.7
6
Text-based agent interactionTextWorld Cooking
Accuracy76
6
Text-based agent interactionTextWorld Treasure
Accuracy81
6
Text-based agent interactionTextWorld Quest
Accuracy88
6
Language-Conditioned TasksTextWorld Cooking
Mean Episodic Return0.78
5
Household task planningTextWorld Cooking (test)
Metric-
0
Showing 14 of 14 rows