| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Robotic Planning | LEMMA Single-Agent Overall | Success Rate (SR)80 | 14 | |
| Robotic Planning | LEMMA Single-Agent Underspecified | Success Rate (SR)99 | 14 | |
| Robotic Planning | LEMMA Single-Agent Absence | Success Rate (SR)96 | 14 | |
| Robotic Planning | LEMMA Single-Agent Multiplicity | Success Rate (SR)77 | 14 | |
| Robotic Planning | LEMMA Stack and Pass tasks, partially observed (test) | Success Rate75 | 8 | |
| Robotic Task Planning | LEMMA Single-agent | Calls per Episode10.36 | 4 | |
| Fault detection | LEMMA | AUROC74.1 | 2 |