Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DeepResearch

Benchmarks

Task NameDataset NameSOTA ResultTrend
Deep ResearchDeepResearch Bench
RACE Overall53.08
22
Judge Agreement AccuracyDeepResearch 1319 queries (test)
Agreement Accuracy74.5
19
Long-form deep researchDeepResearch Bench (test)
Overall Score48.24
13
Question AnsweringDeepResearch
HotpotQA Score44.7
12
Showing 4 of 4 rows