Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BrowseComp+

Benchmarks

Task NameDataset NameSOTA ResultTrend
Web Browsing and Tool UseBrowseComp+ original (test)
Performance (%)38.72
15
Web Browsing ReasoningBrowseComp+
Avg@8 Accuracy11
7
Scaling Model ValidationBrowseComp-Plus Out-of-sample (val)
MAE0.071
1
Showing 3 of 3 rows