| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| GAIA | Argus-35B-A3B (Parallel) | Accuracy93.2 | 291 | 16d ago | |
| GAIA | MemEvolve + Flash-Searcher | Avg Performance80.61 | 72 | 21d ago | |
| GAIA | Pass@1 Score83.4 | 38 | 15d ago | ||
| GAIA | PIVOT | Task Success Rate71.5 | 30 | 21d ago | |
| GAIA n=165 (dev) | OAgent | Average Accuracy73.93 | 23 | 1mo ago | |
| GAIA | MiroThinker-H1 | Avg@8 Score88.5 | 22 | 1mo ago | |
| GAIA Level 3 original (test) | Performance37.5 | 15 | 3mo ago | ||
| GAIA Level 2 original (test) | EvoRoute | Perf (%)59.3 | 15 | 3mo ago | |
| GAIA Level 1 original (test) | EvoRoute | Performance (%)83.02 | 15 | 3mo ago | |
| GAIA All levels original (test) | EvoRoute | Performance (%)63.19 | 15 | 3mo ago | |
| GAIA Out-of-Distribution | Accuracy47 | 14 | 1mo ago | ||
| GAIA 2 | Score43.7 | 14 | 3mo ago | ||
| GAIA Level 2 | EvoMAS-7 | Success Rate42.9 | 8 | 22d ago | |
| GAIA Level 1 | EvoMAS-7 | Success Rate66.7 | 8 | 22d ago | |
| GAIA (test) | ExpSeek | Pass@363.11 | 8 | 3mo ago | |
| GAIA level2 Text-only | TodoEvolve | Accuracy57.14 | 8 | 3mo ago | |
| GAIA 68 curated tasks | GraphBit | Accuracy67.6 | 7 | 19d ago | |
| GAIA Level 3 (val) | Accuracy44.2 | 2 | 3mo ago | ||
| GAIA Level 2 (val) | Accuracy54.2 | 2 | 3mo ago | ||
| GAIA Level 1 (val) | Accuracy62.3 | 2 | 3mo ago |