| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| GAIA | Accuracy87.4 | 266 | 4d ago | ||
| GAIA | MemEvolve + Flash-Searcher | Avg Performance80.61 | 54 | 4d ago | |
| GAIA Level 3 original (test) | Performance37.5 | 15 | 4d ago | ||
| GAIA Level 2 original (test) | EvoRoute | Perf (%)59.3 | 15 | 4d ago | |
| GAIA Level 1 original (test) | EvoRoute | Performance (%)83.02 | 15 | 4d ago | |
| GAIA All levels original (test) | EvoRoute | Performance (%)63.19 | 15 | 4d ago | |
| GAIA 2 | Score43.7 | 14 | 4d ago | ||
| GAIA (test) | ExpSeek | Pass@363.11 | 8 | 4d ago | |
| GAIA level2 Text-only | TodoEvolve | Accuracy57.14 | 8 | 4d ago | |
| GAIA Level 3 (val) | Accuracy44.2 | 2 | 4d ago | ||
| GAIA Level 2 (val) | Accuracy54.2 | 2 | 4d ago | ||
| GAIA Level 1 (val) | Accuracy62.3 | 2 | 4d ago |