| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| General AI Assistant Tasks | GAIA | Accuracy87.4 | 266 | |
| General AI Assistant tasks | GAIA | Avg Performance80.61 | 54 | |
| Agentic Evaluation | GAIA | Accuracy28.12 | 50 | |
| Deep search | gaia | Accuracy81.9 | 37 | |
| General AI Assistant Task | GAIA (val) | Level 1 Score94.3 | 33 | |
| Agentic Benchmarks | GAIA | Execution Time (min)1.6 | 25 | |
| Deep Search | GAIA text-only (val) | Accuracy70.9 | 24 | |
| Embodied Agentic | GAIA | Accuracy0.672 | 21 | |
| Deep Research | GAIA text-only original (test) | Pass@174.1 | 20 | |
| General AI Assistant | GAIA text | GAIA Average Score70.5 | 19 | |
| Multi-turn tool use | GAIA | Pass@176.4 | 18 | |
| General AI Assistant Reasoning | GAIA Full | Accuracy60.12 | 18 | |
| General AI Assistant Reasoning | GAIA (File/Reasoning/Others) | Accuracy56.21 | 18 | |
| General AI Assistant Reasoning | GAIA (Web) | Accuracy63.33 | 18 | |
| Agentic Reasoning | GAIA (val) | Average Score86.06 | 17 | |
| Inference Time Consumption | GAIA | Latency (Research And Data)11.2 | 16 | |
| Information-Seeking | GAIA 103-question text-only | Pass@175.7 | 16 | |
| Deep Research | GAIA | Pass@151.46 | 16 | |
| Deep Research | GAIA | Pass@170.5 | 15 | |
| General AI Assistant Tasks | GAIA Level 3 original (test) | Performance37.5 | 15 | |
| General AI Assistant Tasks | GAIA Level 2 original (test) | Perf (%)59.3 | 15 | |
| General AI Assistant Tasks | GAIA Level 1 original (test) | Performance (%)83.02 | 15 | |
| General AI Assistant Tasks | GAIA All levels original (test) | Performance (%)63.19 | 15 | |
| General Assistant Tasks | GAIA | Success Rate46.7 | 15 | |
| General AI Assistant Task Completion | GAIA Text-Only | Accuracy0.874 | 15 |