| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Research Evaluation | ResearchRubrics | Accuracy49.74 | 19 | |
| ResearchRubrics | ResearchRubrics | Accuracy50.97 | 19 | |
| Long-horizon agentic task | ResearchRubrics | Performance49.36 | 18 | |
| Research Automation | RESEARCHRUBRICS | Score63.69 | 5 | |
| Deep Research Evaluation | ResearchRubrics | WQ Score66.6 | 3 |