| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| CNOT minimization | Overcooked setting circuits | Avg CNOT Count3.52 | 26 | |
| Coordination | Overcooked | Time43.53 | 14 | |
| Human-Agent Coordination | Overcooked Multi-strategy Counter (human evaluation) | Average Score93.09 | 9 | |
| Human-Agent Coordination | Overcooked Counter Circuit (human evaluation) | Average Score91.11 | 9 | |
| Multi-agent coordination | Overcooked cross-play official AI library | Time48.05 | 7 | |
| Multi-agent Coordination | Overcooked | IQM Return607.76 | 5 | |
| Multi-agent Coordination | Overcooked 1.0 (Extended Evaluation 10 rounds x 25 episodes) | Success Rate93.5 | 5 | |
| Multi-agent Coordination | Overcooked Short evaluation (10 rounds x 5 episodes) 1.0 | Success Rate90 | 5 | |
| Teammate-type classification | Overcooked Coordination Ring layout | Classification Accuracy96 | 5 | |
| Teammate-type classification | Overcooked Asymmetric Advantage layout | Classification Accuracy77 | 5 | |
| Teammate-type classification | Overcooked Cramped Room layout | Accuracy96 | 5 | |
| Open-set assistance | Overcooked Task Generalization (held-out) | Coaching Score (LLM-as-Judge)85.96 | 4 | |
| Open-set assistance | Overcooked Defects (held-out) | Coaching (LLM-as-judge)77.8 | 4 | |
| Human-AI Collaboration | Overcooked Cramped Room | Reward91.33 | 4 | |
| Human-AI Collaboration | Overcooked Coordination Ring | Reward8.4 | 4 | |
| Human-AI Collaboration | Overcooked Asymmetric Advantages | Reward72 | 4 | |
| Detection | Overcooked Asym.-2 | Floor Score97 | 3 | |
| Detection | Overcooked (Cramped-2) | Floor Score99 | 3 | |
| Detection | Overcooked Asym.-1 | Floor Score99 | 3 | |
| Detection | Overcooked Cramped-1 | Floor Score87 | 3 | |
| Multi-agent task completion | Overcooked Hard novel maps (seen tasks) | Completion Rate56.3 | 3 | |
| Multi-agent task completion | Overcooked Medium novel maps (seen tasks) | Completion Rate97.5 | 3 | |
| Multi-agent task completion | Overcooked Easy novel maps (seen tasks) | Completion Rate100 | 3 | |
| Multi-agent coordination | Overcooked unseen tasks, hard difficulty | Completion Rate43.7 | 2 | |
| Multi-agent coordination | Overcooked unseen tasks, medium difficulty | Completion Rate1 | 2 |