Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Overcooked

Benchmarks

Task NameDataset NameSOTA ResultTrend
CNOT minimizationOvercooked setting circuits
Avg CNOT Count3.52
26
CoordinationOvercooked
Time43.53
14
Human-Agent CoordinationOvercooked Multi-strategy Counter (human evaluation)
Average Score93.09
9
Human-Agent CoordinationOvercooked Counter Circuit (human evaluation)
Average Score91.11
9
Multi-agent coordinationOvercooked cross-play official AI library
Time48.05
7
Multi-agent CoordinationOvercooked
IQM Return607.76
5
Multi-agent CoordinationOvercooked 1.0 (Extended Evaluation 10 rounds x 25 episodes)
Success Rate93.5
5
Multi-agent CoordinationOvercooked Short evaluation (10 rounds x 5 episodes) 1.0
Success Rate90
5
Teammate-type classificationOvercooked Coordination Ring layout
Classification Accuracy96
5
Teammate-type classificationOvercooked Asymmetric Advantage layout
Classification Accuracy77
5
Teammate-type classificationOvercooked Cramped Room layout
Accuracy96
5
Open-set assistanceOvercooked Task Generalization (held-out)
Coaching Score (LLM-as-Judge)85.96
4
Open-set assistanceOvercooked Defects (held-out)
Coaching (LLM-as-judge)77.8
4
Human-AI CollaborationOvercooked Cramped Room
Reward91.33
4
Human-AI CollaborationOvercooked Coordination Ring
Reward8.4
4
Human-AI CollaborationOvercooked Asymmetric Advantages
Reward72
4
DetectionOvercooked Asym.-2
Floor Score97
3
DetectionOvercooked (Cramped-2)
Floor Score99
3
DetectionOvercooked Asym.-1
Floor Score99
3
DetectionOvercooked Cramped-1
Floor Score87
3
Multi-agent task completionOvercooked Hard novel maps (seen tasks)
Completion Rate56.3
3
Multi-agent task completionOvercooked Medium novel maps (seen tasks)
Completion Rate97.5
3
Multi-agent task completionOvercooked Easy novel maps (seen tasks)
Completion Rate100
3
Multi-agent coordinationOvercooked unseen tasks, hard difficulty
Completion Rate43.7
2
Multi-agent coordinationOvercooked unseen tasks, medium difficulty
Completion Rate1
2
Showing 25 of 30 rows