| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Robotic Manipulation | CALVIN ABCD->D | Avg Length0.4 | 130 | |
| Long-horizon robot manipulation | Calvin ABCD→D | Task 1 Completion Rate99.4 | 127 | |
| Robotic Manipulation | Calvin ABC-D | Task-1 Score100 | 71 | |
| Long-horizon task completion | Calvin ABC->D | Success Rate (1)96.8 | 67 | |
| Sequential Robotic Manipulation | CALVIN | Success Rate (1 task)99.8 | 63 | |
| Robot Manipulation | CALVIN (ABC->D) | Average Successful Length4.75 | 62 | |
| Long-horizon robotic manipulation | CALVIN ABC-D | Average Trajectory Length0.27 | 40 | |
| Robotic Manipulation | CALVIN D→D | Average Length4.52 | 40 | |
| Instruction-following robotic manipulation | CALVIN ABC→D (unseen environment D) | Success Rate (Length 1)98.5 | 29 | |
| Robot Manipulation | CALVIN ABC->D 1.0 | Success Rate (1 Inst)96.8 | 18 | |
| Long-horizon Robot Manipulation | CALVIN long-horizon | Success Rate 196.9 | 17 | |
| Long-horizon language-conditioned policy learning | CALVIN | Success Rate (Step 5/5)98.4 | 16 | |
| Long-horizon robotic manipulation | CALVIN ABC→D Zero-shot | Task 1 Success Rate98.8 | 16 | |
| Long-horizon robot manipulation | CALVIN | Task Completion Rate (1)96.3 | 15 | |
| Long-horizon task completion | CALVIN | Success Rate (1 Task)93.8 | 15 | |
| Robotic Manipulation | CALVIN | Average Length2.55 | 13 | |
| Long-Horizon Multi-Task Language Control | CALVIN ABC→D (test) | Seq Success (1)96 | 13 | |
| Long-horizon language-conditioned manipulation | Calvin ABC→D | Success Rate (Seq 1)97.3 | 12 | |
| Language-Conditioned Manipulation | CALVIN MTLC | Success Rate95 | 12 | |
| Long-horizon task success | CALVIN D→D long-horizon | Success Rate (LH-1)99.5 | 11 | |
| Robot manipulation | CALVIN 10% ABCD → D | Success Rate (L=1)84.1 | 11 | |
| turn off lightbulb | CALVIN | Success Rate100 | 10 | |
| Language-conditioned manipulation | CALVIN LH-MTLC | Success Rate (1 Instruction)97.5 | 10 | |
| Failure Detection | DSMF-CALVIN (test) | Accuracy90.64 | 10 | |
| Language-conditioned Robotic Instruction Following | CALVIN ABC→D | Success Rate (1 Task)98.9 | 8 |