| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| BrowseComp | Accuracy85.9 | 68 | 16d ago | ||
| BrowseComp-zh | Argus-35B-A3B (Parallel) | Accuracy83.4 | 34 | 16d ago | |
| BC-plus | EM29.6 | 30 | 6d ago | ||
| BrowseComp+ (test) | Accuracy56.4 | 20 | 1mo ago | ||
| BrowseComp (official) | Tendem’s AI agent | Exact Match71 | 5 | 3mo ago | |
| BrowseComp-Plus | Pass72 | 4 | 1mo ago | ||
| WebArena | R2D2 | Accuracy27.3 | 3 | 3mo ago | |
| Custom Tasks | GA | Score57.7 | 2 | 1mo ago | |
| WebCanvas | GA | Primary Score0.834 | 2 | 1mo ago |