| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Interactive Decision Making | Textcraft | Success Rate99.6 | 42 | |
| Item crafting | TextCraft (test) | Success Rate71 | 32 | |
| One-step next-observation prediction | TextCraft (test) | Token F195 | 16 | |
| Language Agent Task | TextCraft | Success Rate (SR)100 | 12 | |
| Compositional planning | TextCraft | Success Rate94 | 8 | |
| Autonomous Exploration | TextCraft | Steps8.7 | 7 | |
| Task Execution | TEXTCRAFT-SYNTH 8K context Easy (evaluation) | Success Rate100 | 4 | |
| Sequential Crafting | TextCraft-4 | Success Rate (SR)45.5 | 4 | |
| Sequential Crafting | TextCraft-3 | Success Rate (%)82.5 | 4 | |
| Sequential Crafting | TextCraft 2 | Success Rate (SR)94.3 | 4 | |
| Crafting ace items | TextCraft | Success Rate58 | 4 |