Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BrowseComp+

Benchmarks

Task NameDataset NameSOTA ResultTrend
Deep research agents / Multi-step reasoningBrowseComp-Plus OOD
Success Rate (SR)54.6
24
Long-context reasoningBrowseComp+ 1K documents
Accuracy94.6
16
Web Browsing and Tool UseBrowseComp+ original (test)
Performance (%)38.72
15
Web Browsing ReasoningBrowseComp+
Avg@8 Accuracy11
7
Scaling Model ValidationBrowseComp-Plus Out-of-sample (val)
MAE0.071
1
Showing 5 of 5 rows