Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BrowseComp-ZH

Benchmarks

Task NameDataset NameSOTA ResultTrend
Deep SearchBrowseComp-ZH (test)
Accuracy58.1
27
Long-horizon agentic tasksBrowseComp-ZH Our Settings
Pass@171.3
25
Web ResearchBrowseComp-ZH
Pass@129.1
19
Agentic Web InteractionBrowseComp-ZH (test)
Pass@161.3
10
Long-horizon agentic tasksBrowseComp-ZH Full
Pass@165
2
Showing 5 of 5 rows