Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TextCraft

Benchmarks

Task NameDataset NameSOTA ResultTrend
Interactive Decision MakingTextcraft
Success Rate99.6
42
Item craftingTextCraft (test)
Success Rate71
32
One-step next-observation predictionTextCraft (test)
Token F195
16
Language Agent TaskTextCraft
Success Rate (SR)100
12
Compositional planningTextCraft
Success Rate94
8
Autonomous ExplorationTextCraft
Steps8.7
7
Task ExecutionTEXTCRAFT-SYNTH 8K context Easy (evaluation)
Success Rate100
4
Sequential CraftingTextCraft-4
Success Rate (SR)45.5
4
Sequential CraftingTextCraft-3
Success Rate (%)82.5
4
Sequential CraftingTextCraft 2
Success Rate (SR)94.3
4
Crafting ace itemsTextCraft
Success Rate58
4
Showing 11 of 11 rows