Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Cooperative Cooking on Overcooked-AI Forced Coordination
Loading...
103
J Score
Diagnostic-Grounded Search
29.16
48.33
67.5
86.67
Mar 25, 2026
J Score
Deliveries
Invalid Deliveries
Updated 24d ago
Evaluation Results
Method
Method
Links
J Score
Deliveries
Invalid Deliveries
Diagnostic-Grounded Search
Generation=Gen 2
2026.03
103
5.15
0.3
Diagnostic-Grounded Search
Generation=Gen 1
2026.03
55
2.75
1.25
MAPPO
Generation=Baseline
2026.03
32
1.6
0.8
Feedback
Search any
task
Search any
task