Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Deep Research on BrowseComp-Plus (EM, F1)
Loading...
32.33
Exact Match (EM)
ZipRL-8B
7.8588
14.2119
20.565
26.9181
May 27, 2026
Exact Match (EM)
F1 Score
Updated 6d ago
Evaluation Results
Method
Method
Links
Exact Match (EM)
F1 Score
ZipRL-8B
Avg Turns=13.8
2026.05
32.33
40.82
GPT-4o-ReAct
Avg Turns=6.0
2026.05
25.2
31.61
Qwen3-235B-ReAct
Avg Turns=2.1
2026.05
20.67
23.55
DeepSeek-v3.2-ReAct
Avg Turns=3.1
2026.05
8.8
13.29
NestBrowse-8B
Avg Turns=1.3
2026.05
8.8
14.53
Feedback
Search any
task
Search any
task