| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| General AI Assistant Tasks | GAIA | Accuracy87.4 | 274 | |
| General AI Assistant tasks | GAIA | Avg Performance80.61 | 54 | |
| Agentic Evaluation | GAIA | Accuracy28.12 | 50 | |
| Deep search | gaia | Accuracy81.9 | 43 | |
| General AI Assistant Task | GAIA (val) | Level 1 Score94.3 | 43 | |
| General AI Assistant tasks | GAIA | Pass@1 Score70.5 | 26 | |
| Agentic Benchmarks | GAIA | Execution Time (min)1.6 | 25 | |
| Deep Search | GAIA text-only (val) | Accuracy70.9 | 24 | |
| Deep research | GAIA | Accuracy78.2 | 24 | |
| Embodied Agentic | GAIA | Accuracy0.672 | 21 | |
| Deep Research | GAIA text-only original (test) | Pass@174.1 | 20 | |
| General AI Assistant | GAIA text-only | Score81.9 | 19 | |
| General AI Assistant | GAIA text | GAIA Average Score70.5 | 19 | |
| Long-Horizon Search Intelligence | GAIA | Pass@157.3 | 18 | |
| Multi-turn tool use | GAIA | Pass@176.4 | 18 | |
| Question Answering | GAIA | Accuracy (Pass@4)51 | 18 | |
| General AI Assistant Reasoning | GAIA Full | Accuracy60.12 | 18 | |
| General AI Assistant Reasoning | GAIA (File/Reasoning/Others) | Accuracy56.21 | 18 | |
| General AI Assistant Reasoning | GAIA (Web) | Accuracy63.33 | 18 | |
| General AI Assistant Reasoning | GAIA | Pass@1 Accuracy67.4 | 17 | |
| Agentic Reasoning | GAIA (val) | Average Score86.06 | 17 | |
| Inference Time Consumption | GAIA | Latency (Research And Data)11.2 | 16 | |
| Information-Seeking | GAIA 103-question text-only | Pass@175.7 | 16 | |
| Deep Research | GAIA | Pass@151.46 | 16 | |
| Deep Research | GAIA | Pass@170.5 | 15 |