| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Interactive Decision-making | TextWorld | Real100 | 24 | |
| Text-based Task Completion | TextWorld | Mean Normalised Score74.28 | 18 | |
| Next-state prediction | TextWorld (TW) | EM Accuracy70.6 | 16 | |
| Text-based agent interaction | TextWorld Cooking (test) | Accuracy75.5 | 14 | |
| Text-based agent interaction | TextWorld Treasure (test) | Accuracy81.5 | 14 | |
| Text-based agent interaction | TextWorld Quest (test) | Accuracy88 | 14 | |
| Task success | TextWorld | Real100 | 14 | |
| Agentic Task Success | Textworld | Success Rate75 | 12 | |
| Interactive Fiction | TextWorld | Success Rate (%)98.7 | 6 | |
| Text-based agent interaction | TextWorld Cooking | Accuracy76 | 6 | |
| Text-based agent interaction | TextWorld Treasure | Accuracy81 | 6 | |
| Text-based agent interaction | TextWorld Quest | Accuracy88 | 6 | |
| Language-Conditioned Tasks | TextWorld Cooking | Mean Episodic Return0.78 | 5 | |
| Household task planning | TextWorld Cooking (test) | Metric- | 0 |