| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Deep Research Report Generation | DeepResearch Bench | Comprehensiveness52.84 | 54 | |
| Deep Research | DeepResearch Bench official 100-task-subset 1.0 | RACE Overall0.5076 | 24 | |
| Report Generation | DeepResearch Bench 2025 (test) | Comprehensiveness49.5 | 16 | |
| Deep Research | DeepResearch Bench 1.0 (test) | Overall Score46.45 | 12 | |
| Open-Ended Deep Research | DeepResearch Bench Open-Ended | Overall Score52.09 | 11 | |
| Open-ended deep research evaluation | DeepResearch Bench 100 PhD-level research tasks | Comprehensiveness54.25 | 9 | |
| Research Report Generation | DeepResearch Bench RACE framework 1.0 (test) | Overall Score49.71 | 7 | |
| Clarification Generation | DeepResearch Bench online interactive settings | Intent Precision36.44 | 6 | |
| Clarification Generation | DeepResearch Bench offline (test) | Quality Score2.43 | 4 |