| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Web Browsing | BrowseComp | Accuracy85.9 | 68 | |
| Agentic Web Browsing | BrowseComp-ZH | Pass@175.9 | 52 | |
| Deep Research | BrowseComp | Score74.9 | 47 | |
| Agentic Web Browsing | BrowseComp | Pass@167.6 | 47 | |
| Deep Research | BrowseComp-ZH (BC-zh) original (test) | Pass@158.1 | 45 | |
| Web research | BrowseComp zh | Accuracy (%)52.9 | 39 | |
| Deep Research | BrowseComp+ | Accuracy55.33 | 38 | |
| Deep Search | BrowseComp-ZH | Accuracy66.6 | 35 | |
| Web Browsing | BrowseComp-zh | Accuracy83.4 | 34 | |
| Deep Research | BrowseComp | Pass@150.9 | 33 | |
| Deep Research Task | BrowseComp | Accuracy67.6 | 29 | |
| Deep Search | BrowseComp (test) | Accuracy49.7 | 27 | |
| Agentic | BrowseComp | Score78.4 | 27 | |
| Web Task Reasoning | BrowseComp (test) | Pass@148.7 | 25 | |
| BrowseComp-Plus | BrowseComp-Plus | Accuracy79.33 | 25 | |
| Question Answering | BrowseComp-Plus | Accuracy (Avg)88.33 | 25 | |
| Web-search QA | BrowseComp-VL | Pass@154.9 | 24 | |
| Long-horizon agentic task | BrowseComp-Plus | Performance77.33 | 24 | |
| Long-horizon agentic task | BrowseComp | Performance71.33 | 24 | |
| Deep-search QA | BrowseComp (test) | Pass@151.5 | 24 | |
| Deep Search | Browsecomp | Accuracy52 | 24 | |
| Multi-step navigation and information location | BrowseComp English | Score54.9 | 22 | |
| Multimodal deep search and reasoning | BrowseComp V3 | Success Rate (SR) - Avg68.03 | 22 | |
| Web-based Question Answering | BrowseComp-plus | Accuracy78.41 | 22 | |
| Multi-agent system task solving | browsecomp | Accuracy74.5 | 21 |