| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| BrowseComp standard (full) | Pass@151.5 | 20 | 4d ago | ||
| BrowseComp | Success Rate51.5 | 19 | 4d ago | ||
| BrowseComp Chinese (full) | Pass@158.1 | 19 | 4d ago | ||
| Xbench-DS | Qwen3-VL-30B-A3B-ICA | Success Rate75 | 18 | 4d ago | |
| XBench 2505 (full) | NestBrowse-30B-A3B | pass@175 | 17 | 4d ago | |
| GAIA 103-question text-only | NestBrowse-30B-A3B | Pass@175.7 | 16 | 4d ago | |
| GAIA | Success Rate70.5 | 13 | 4d ago | ||
| 20Q Breeds weighted (test) | Worst-case Weighted Payoff47.8 | 8 | 4d ago | ||
| 20Q Common weighted (test) | Worst-case Weighted Payoff235.7 | 8 | 4d ago | ||
| Seal-0 | Qwen3-VL-30B-A3B-ICA | Success Rate27 | 6 | 4d ago | |
| DeepWide Search Benchmark | Claude-Sonnet-4 (TaS) | Col-F155.9 | 5 | 4d ago | |
| XBench v2510 (full) | NestBrowse-30B-A3B | Pass@145 | 2 | 4d ago |