| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Web Browsing | BrowseComp | Accuracy73.33 | 52 | |
| Agentic Web Browsing | BrowseComp | Pass@167.6 | 47 | |
| Deep Research | BrowseComp-ZH (BC-zh) original (test) | Pass@158.1 | 45 | |
| Agentic Web Browsing | BrowseComp-ZH | Pass@175.9 | 44 | |
| Web research | BrowseComp zh | Accuracy (%)52.9 | 39 | |
| Deep Research | BrowseComp+ | Accuracy55.33 | 38 | |
| Deep Research | BrowseComp | Pass@150.9 | 33 | |
| Deep Research Task | BrowseComp | Accuracy67.6 | 29 | |
| Deep Search | BrowseComp (test) | Accuracy49.7 | 27 | |
| Agentic | BrowseComp | Score78.4 | 27 | |
| BrowseComp-Plus | BrowseComp-Plus | Accuracy79.33 | 25 | |
| Long-horizon agentic task | BrowseComp-Plus | Performance77.33 | 24 | |
| Long-horizon agentic task | BrowseComp | Performance71.33 | 24 | |
| Deep-search QA | BrowseComp (test) | Pass@151.5 | 24 | |
| Multi-step navigation and information location | BrowseComp English | Score54.9 | 22 | |
| Multimodal deep search and reasoning | BrowseComp V3 | Success Rate (SR) - Avg68.03 | 22 | |
| Web-based Question Answering | BrowseComp-plus | Accuracy78.41 | 22 | |
| Multi-step navigation and information location | BrowseComp-ZH | Score68.7 | 21 | |
| Deep Research | BrowseComp | Score74.9 | 21 | |
| Web Browsing | BrowseComp-zh | Accuracy65 | 21 | |
| Web Browsing | BrowseComp+ (test) | Accuracy56.4 | 20 | |
| Information-Seeking | BrowseComp standard (full) | Pass@151.5 | 20 | |
| General AI Assistant Reasoning | BrowseComp-zh (BC-zh) | Pass@1 Accuracy42.9 | 19 | |
| Information-seeking | BrowseComp | Success Rate51.5 | 19 | |
| Information-Seeking | BrowseComp Chinese (full) | Pass@158.1 | 19 |