| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| LLM Agent Navigation | BabyAI (test) | Success Rate93.3 | 25 | |
| Instruction Following | BabyAI | Success Rate72.56 | 14 | |
| Instruction Following | BabyAI BossLevel | Success Rate96.2 | 14 | |
| Imitation Learning | BabyAI BossLevel (test) | Success Rate72 | 9 | |
| Imitation Learning | BabyAI SynthSeq (test) | Success Rate0.642 | 9 | |
| Imitation Learning | BabyAI GoToSeq (test) | Success Rate77.2 | 9 | |
| Instruction Following | BabyAI Synthseq | Average Episodic Reward0.361 | 7 | |
| Instruction Following | BabyAI Pickup | Average Episodic Reward0.486 | 7 | |
| Instruction Following | BabyAI Goto | Average Episodic Reward0.575 | 7 | |
| Bosslevel | BabyAI | Average Pass Rate0.343 | 7 | |
| Synthseq | BabyAI | Average Pass Rate32.1 | 7 | |
| Pickup | BabyAI | Average Pass Rate33.4 | 7 | |
| Goto | BabyAI | Average Pass Rate0.606 | 7 | |
| Representational Alignment | BabyAI instruction set | P@1046.65 | 7 | |
| Hierarchical Planning | BabyAI Combined Skills 3 | Token Cost2,454 | 6 | |
| Hierarchical Planning | BabyAI Combined Skills 2 | Token Cost2,528 | 6 | |
| Hierarchical Planning | BabyAI Combined Skills 1 | Token Cost1,961 | 6 | |
| Hierarchical Planning | BabyAI Unlock | Token Cost5,705 | 6 | |
| Hierarchical Planning | BabyAI Pickup | Token Cost2,405 | 6 | |
| Navigation | BabyAI | Success Rate93.2 | 2 |