| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Abstractive Summarization | Stack ConvoSumm 1.0 (test) | ROUGE-139.73 | 11 | |
| Object Stacking | Stack Composition C (test) | Success Rate93.7 | 10 | |
| Object Stacking | Stack Spuriousness S (test) | Success Rate97.6 | 10 | |
| Object Stacking | Stack In-distribution I (test) | Success Rate97.2 | 10 | |
| Robotic Manipulation | Stack Shifted Environment (test) | Testing Reward0.77 | 8 | |
| Task Planning | Stack 1.0 (test) | Average Planning Time Cost (s)5.94 | 3 |