Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BrowseComp

Benchmarks

Task NameDataset NameSOTA ResultTrend
Web BrowsingBrowseComp
Accuracy85.9
68
Agentic Web BrowsingBrowseComp-ZH
Pass@175.9
52
Deep ResearchBrowseComp
Score74.9
47
Agentic Web BrowsingBrowseComp
Pass@167.6
47
Deep ResearchBrowseComp-ZH (BC-zh) original (test)
Pass@158.1
45
Web researchBrowseComp zh
Accuracy (%)52.9
39
Deep ResearchBrowseComp+
Accuracy55.33
38
Deep SearchBrowseComp-ZH
Accuracy66.6
35
Web BrowsingBrowseComp-zh
Accuracy83.4
34
Deep ResearchBrowseComp
Pass@150.9
33
Deep Research TaskBrowseComp
Accuracy67.6
29
Deep SearchBrowseComp (test)
Accuracy49.7
27
AgenticBrowseComp
Score78.4
27
Web Task ReasoningBrowseComp (test)
Pass@148.7
25
BrowseComp-PlusBrowseComp-Plus
Accuracy79.33
25
Question AnsweringBrowseComp-Plus
Accuracy (Avg)88.33
25
Web-search QABrowseComp-VL
Pass@154.9
24
Long-horizon agentic taskBrowseComp-Plus
Performance77.33
24
Long-horizon agentic taskBrowseComp
Performance71.33
24
Deep-search QABrowseComp (test)
Pass@151.5
24
Deep SearchBrowsecomp
Accuracy52
24
Multi-step navigation and information locationBrowseComp English
Score54.9
22
Multimodal deep search and reasoningBrowseComp V3
Success Rate (SR) - Avg68.03
22
Web-based Question AnsweringBrowseComp-plus
Accuracy78.41
22
Multi-agent system task solvingbrowsecomp
Accuracy74.5
21
Showing 25 of 94 rows