Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-agent Reasoning on ARMMAN
Loading...
85.78
Accuracy
OW-L
84.3448
84.7174
85.09
85.4626
Oct 1, 2025
Accuracy
Updated 14d ago
Evaluation Results
Method
Method
Links
Accuracy
OW-L
Ensemble=GPT-4o-2024-1...
2025.10
85.78
OW-I
Ensemble=GPT-4o-2024-1...
2025.10
85.78
ISP
Ensemble=GPT-4o-2024-1...
2025.10
85.78
Single Best
Ensemble=GPT-4o-2024-1...
2025.10
85.32
MV
Ensemble=GPT-4o-2024-1...
2025.10
85.24
OW-L
Ensemble Size=All Eigh...
2025.10
85.1
OW-I
Ensemble Size=All Eigh...
2025.10
84.94
ISP
Ensemble Size=All Eigh...
2025.10
84.79
MV
Ensemble Size=All Eigh...
2025.10
84.4
Feedback
Search any
task
Search any
task