| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| GPQA | CoTAgent | Calls198 | 9 | 4d ago | |
| AIME 24 | CoTAgent | Calls30 | 9 | 4d ago | |
| Reasoning Benchmarks Cooperative AutoGen framework (test) | MARSHAL (Generalist, 8B) | Overall Accuracy83.58 | 2 | 4d ago | |
| Reasoning Benchmarks Competitive MAD framework (test) | MARSHAL (Generalist, 8B) | Average Score0.8509 | 2 | 4d ago |