| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Deep Search Tasks (test) | SALE w/o memory | Pass@191.3 | 42 | 4d ago | |
| gaia | MiroThinker-v1.0-72B | Accuracy81.9 | 37 | 2d ago | |
| BrowseComp-ZH (test) | Accuracy58.1 | 27 | 4d ago | ||
| BrowseComp (test) | Accuracy49.7 | 27 | 4d ago | ||
| xbench DeepSearch (test) | Tongyi-DeepResearch | Accuracy75 | 26 | 4d ago | |
| GAIA text-only (val) | Tongyi-DeepResearch | Accuracy70.9 | 24 | 4d ago | |
| BrowseComp-ZH | TaS | Accuracy63.7 | 17 | 4d ago | |
| xBench DeepSearch (05) | Score75 | 14 | 4d ago | ||
| HLE text-only | Score40.8 | 14 | 4d ago | ||
| Browse Comp-ZH | Score65 | 14 | 4d ago | ||
| Browse Comp | Score67.6 | 14 | 4d ago | ||
| GAIA text-only | Score0.757 | 14 | 4d ago | ||
| X-Bench | Score (%)75 | 14 | 4d ago | ||
| BrowseComp-Plus | Score70 | 13 | 4d ago | ||
| SEAL 0 | Nanbeige4.1-3B | Score41.44 | 11 | 4d ago | |
| xBench DeepSearch-10 | Nanbeige4.1-3B | Score39 | 8 | 4d ago | |
| Average webw., hle, gaia | Qwen3-8B + TEPOdense | Accuracy9.87 | 7 | 4d ago | |
| Browsecomp | SAGE | Accuracy2.6 | 6 | 2d ago | |
| hle | Musique | Accuracy8 | 6 | 2d ago | |
| xbench DeepSearch (leaderboard) | - | - | 0 | 4d ago | |
| webw. | - | - | 0 | 4d ago |