| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Deep Research | ResearchQA | Score79.2 | 21 | |
| Long-form research | ResearchQA | Score79.2 | 18 | |
| Agentic Reasoning | ResearchQA (test) | Score73.9 | 14 | |
| Long-form deep-research answering | ResearchQA Mini | Score79.1 | 13 | |
| Science Question Answering | ResearchQA | Accuracy (ResearchQA)85.8 | 13 | |
| Agentic Task | ResearchQA | Score73.7 | 10 | |
| Science Question Answering | ResearchQA Science | Score77.31 | 10 | |
| Question Answering | ResearchQA (RQA) Artificial Intelligence (test) | Rubrics Score79.3 | 6 |