Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-agent policy synthesis on Gathering
Loading...
4.59
Metric U
Gemini 3.1 Pro
0.6172
1.6486
2.68
3.7114
Mar 19, 2026
Metric U
Metric E
Metric S
Updated 2mo ago
Evaluation Results
Method
Method
Links
Metric U
Metric E
Metric S
Gemini 3.1 Pro
Feedback=reward+social
2026.03
4.59
97
502.7
Gemini 3.1 Pro
Feedback=reward-only
2026.03
4.58
97
502.5
Gemini 3.1 Pro
Feedback=zero-shot
2026.03
3.71
79
443.2
Claude Sonnet 4.6
Feedback=reward+social
2026.03
3.53
84
452.7
Claude Sonnet 4.6
Feedback=reward-only
2026.03
3.47
72
402.9
GEPA (Gemini 3.1 Pro)
2026.03
3.45
91
496.2
Claude Sonnet 4.6
Feedback=zero-shot
2026.03
1.85
52
298.6
BFS Collector
2026.03
1.29
54
489.5
Q-learner
2026.03
0.77
83
508.2
Feedback
Search any
task
Search any
task