Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BrowseComp

Benchmarks

Task NameDataset NameSOTA ResultTrend
Web BrowsingBrowseComp
Accuracy73.33
52
Agentic Web BrowsingBrowseComp
Pass@167.6
47
Deep ResearchBrowseComp-ZH (BC-zh) original (test)
Pass@158.1
45
Agentic Web BrowsingBrowseComp-ZH
Pass@175.9
44
Web researchBrowseComp zh
Accuracy (%)52.9
39
Deep ResearchBrowseComp+
Accuracy55.33
38
Deep ResearchBrowseComp
Pass@150.9
33
Deep Research TaskBrowseComp
Accuracy67.6
29
Deep SearchBrowseComp (test)
Accuracy49.7
27
AgenticBrowseComp
Score78.4
27
BrowseComp-PlusBrowseComp-Plus
Accuracy79.33
25
Long-horizon agentic taskBrowseComp-Plus
Performance77.33
24
Long-horizon agentic taskBrowseComp
Performance71.33
24
Deep-search QABrowseComp (test)
Pass@151.5
24
Multi-step navigation and information locationBrowseComp English
Score54.9
22
Multimodal deep search and reasoningBrowseComp V3
Success Rate (SR) - Avg68.03
22
Web-based Question AnsweringBrowseComp-plus
Accuracy78.41
22
Multi-step navigation and information locationBrowseComp-ZH
Score68.7
21
Deep ResearchBrowseComp
Score74.9
21
Web BrowsingBrowseComp-zh
Accuracy65
21
Web BrowsingBrowseComp+ (test)
Accuracy56.4
20
Information-SeekingBrowseComp standard (full)
Pass@151.5
20
General AI Assistant ReasoningBrowseComp-zh (BC-zh)
Pass@1 Accuracy42.9
19
Information-seekingBrowseComp
Success Rate51.5
19
Information-SeekingBrowseComp Chinese (full)
Pass@158.1
19
Showing 25 of 71 rows