| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Deep Research | BrowseComp-ZH (BC-zh) original (test) | Pass@158.1 | 45 | |
| Deep Research Task | BrowseComp | Accuracy67.6 | 29 | |
| Deep Search | BrowseComp (test) | Accuracy49.7 | 27 | |
| Agentic | BrowseComp | Score78.4 | 27 | |
| Deep-search QA | BrowseComp (test) | Pass@151.5 | 24 | |
| Multimodal deep search and reasoning | BrowseComp V3 | Success Rate (SR) - Avg68.03 | 22 | |
| Deep Research | BrowseComp | Score74.9 | 21 | |
| Agentic Web Browsing | BrowseComp | Pass@167.6 | 21 | |
| Information-Seeking | BrowseComp standard (full) | Pass@151.5 | 20 | |
| Information-seeking | BrowseComp | Success Rate51.5 | 19 | |
| Information-Seeking | BrowseComp Chinese (full) | Pass@158.1 | 19 | |
| Deep Research | BrowseComp+ | Accuracy55.33 | 19 | |
| Multi-turn tool use | BrowseComp-ZH | Pass@158.1 | 18 | |
| Multi-turn tool use | BrowseComp | Pass@150.9 | 18 | |
| Agentic Web Browsing | BrowseComp-ZH | Pass@175.9 | 18 | |
| Deep research | BrowseComp-zh | Accuracy66.6 | 18 | |
| Deep Search | BrowseComp-ZH | Accuracy63.7 | 17 | |
| Deep Research | BrowseComp-zh | BrowseComp-zh Score81.3 | 16 | |
| Deep Research | BrowseComp-ZH | Pass@158.1 | 15 | |
| Deep Research | BrowseComp | Pass@150.9 | 15 | |
| Deep Search | BrowseComp-Plus | Score70 | 13 | |
| Web Browsing and Interaction | Browsecomp | Accuracy51.5 | 12 | |
| Agentic Search | BrowseComp-ZH (test) | LJFT21.45 | 12 | |
| Tool Use | BrowseComp Domains (Domain-specific (9) + Full Search) | Accuracy27.8 | 10 | |
| Tool Use | BrowseComp Domain-specific (9) Search | Accuracy22.5 | 10 |