Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ResearchQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Deep ResearchResearchQA
Score79.2
21
Long-form researchResearchQA
Score79.2
18
Agentic ReasoningResearchQA (test)
Score73.9
14
Long-form deep-research answeringResearchQA Mini
Score79.1
13
Science Question AnsweringResearchQA
Accuracy (ResearchQA)85.8
13
Agentic TaskResearchQA
Score73.7
10
Science Question AnsweringResearchQA Science
Score77.31
10
Question AnsweringResearchQA (RQA) Artificial Intelligence (test)
Rubrics Score79.3
6
Showing 8 of 8 rows