Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DeepResearch Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Deep Research Report GenerationDeepResearch Bench
Comprehensiveness52.84
54
Deep ResearchDeepResearch Bench official 100-task-subset 1.0
RACE Overall0.5076
24
Report GenerationDeepResearch Bench 2025 (test)
Comprehensiveness49.5
16
Deep ResearchDeepResearch Bench 1.0 (test)
Overall Score46.45
12
Open-Ended Deep ResearchDeepResearch Bench Open-Ended
Overall Score52.09
11
Open-ended deep research evaluationDeepResearch Bench 100 PhD-level research tasks
Comprehensiveness54.25
9
Research Report GenerationDeepResearch Bench RACE framework 1.0 (test)
Overall Score49.71
7
Clarification GenerationDeepResearch Bench online interactive settings
Intent Precision36.44
6
Clarification GenerationDeepResearch Bench offline (test)
Quality Score2.43
4
Showing 9 of 9 rows