Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-agent Reasoning on MMLU
Loading...
91.02
Accuracy
Single Best
87.0368
88.0709
89.105
90.1391
Oct 1, 2025
Accuracy
Updated 14d ago
Evaluation Results
Method
Method
Links
Accuracy
Single Best
Ensemble=GPT-4o-2024-1...
2025.10
91.02
OW-L
Ensemble=GPT-4o-2024-1...
2025.10
90.37
OW-I
Ensemble=GPT-4o-2024-1...
2025.10
90.37
ISP
Ensemble=GPT-4o-2024-1...
2025.10
90.01
MV
Ensemble=GPT-4o-2024-1...
2025.10
89.32
OW-I
Ensemble Size=All Eigh...
2025.10
88.64
OW-L
Ensemble Size=All Eigh...
2025.10
88.49
ISP
Ensemble Size=All Eigh...
2025.10
87.92
MV
Ensemble Size=All Eigh...
2025.10
87.19
Feedback
Search any
task
Search any
task