| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Interactive Decision-making | TextWorld | Real100 | 24 | |
| Next-state prediction | TextWorld (TW) | EM Accuracy70.6 | 16 | |
| Task success | TextWorld | Real100 | 14 | |
| Agentic Task Success | Textworld | Success Rate75 | 12 | |
| Household task planning | TextWorld Cooking (test) | Metric- | 0 |