| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Web Navigation Question Answering | WebWalker QA | Accuracy76.5 | 23 | |
| Long-context Memory Retrieval and Reasoning | WebWalker 128K | F1 Score27.44 | 20 | |
| Knowledge-Intensive Reasoning | WebWalker | F1 Score30.5 | 18 | |
| Search | WebWalker | Score59.5 | 7 | |
| Web Search | WebWalker | Pass@161.7 | 6 | |
| Deep Research | WebWalker | F1 Score33.02 | 4 |