| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Deep Research Bench first training epoch (step 600) | Tournament-GRPO | Readability52.09 | 17 | 7d ago | |
| Deep Research Bench (step 1100) | Tournament-GRPO | Readability53.81 | 16 | 7d ago | |
| Aggregate | WQ63.97 | 3 | 3mo ago | ||
| ResearchRubrics | WQ Score66.6 | 3 | 3mo ago | ||
| LiveResearchBench | WQ61.71 | 3 | 3mo ago | ||
| DeepResearchBench | Tongyi Deep Research | WQ63.95 | 3 | 3mo ago |