Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BrowseComp-ZH

Benchmarks

Task NameDataset NameSOTA ResultTrend
Deep SearchBrowseComp-ZH (test)
Accuracy58.1
27
Long-horizon agentic tasksBrowseComp-ZH Our Settings
Pass@171.3
25
Web ResearchBrowseComp-ZH
Pass@129.1
19
SearchBrowseComp-ZH (test)
Accuracy68.7
17
Web Browsing and Navigation (Chinese)BrowseComp-ZH
Avg@365
16
Agentic Web InteractionBrowseComp-ZH (test)
Pass@161.3
10
Long-horizon agentic tasksBrowseComp-ZH Full
Pass@165
2
Showing 7 of 7 rows