| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Web Navigation Question Answering | WebWalker QA | Accuracy76.5 | 23 | |
| Long-context Memory Retrieval and Reasoning | WebWalker 128K | F1 Score27.44 | 20 | |
| Knowledge-Intensive Reasoning | WebWalker | F1 Score30.5 | 18 | |
| Web-based Agent Task Completion | WebWalker | Success Rate (Config)53.5 | 10 | |
| DeepSearch | WebWalker | Success Rate47.2 | 9 | |
| Agentic Search | WebWalker | Accuracy72.7 | 9 | |
| Search | WebWalker | Score59.5 | 7 | |
| Web Search | WebWalker | Pass@161.7 | 6 | |
| Web Browsing and Navigation | WebWalker | Score39.85 | 5 | |
| Web Navigation | WebWalker 100 tasks (test) | Success Rate (Easy)0.125 | 4 | |
| Deep Research | WebWalker | F1 Score33.02 | 4 |