| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Amazon | GEMS | Hit Rate @ 583.99 | 15 | 1mo ago | |
| BrowseComp | Score67.6 | 11 | 1mo ago | ||
| BrowseComp-ZH | Score81.3 | 10 | 1mo ago | ||
| XBench | OpenSeeker-v1-Data-11.7k | Score74 | 9 | 1mo ago | |
| HLE text | Score45.8 | 7 | 1mo ago | ||
| WebWalker | Qwen3-235B | Score59.5 | 7 | 1mo ago | |
| Frames | Qwen3-235B | Score70.5 | 7 | 1mo ago | |
| Multi-agent Simulation averaged across 6 impairment dimensions | AURC (% of max)100 | 5 | 25d ago | ||
| Multi-agent Communication Environment (test) | Mean Normalized Performance Drop0 | 5 | 25d ago | ||
| Large environment | COMRES-VLM | Average Completion Time (timesteps)162.24 | 3 | 1mo ago | |
| Medium environment | COMRES-VLM | Average Completion Time (timesteps)104.25 | 3 | 1mo ago | |
| Small environment | COMRES-VLM | Average Completion Time (timesteps)63.21 | 3 | 1mo ago | |
| MAT-Search | Qwen2.5-VL-3B | F1 Score27.1 | 2 | 1mo ago |