Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RESEARCHRUBRICS

Benchmarks

Task NameDataset NameSOTA ResultTrend
Research EvaluationResearchRubrics
Accuracy49.74
19
ResearchRubricsResearchRubrics
Accuracy50.97
19
Long-horizon agentic taskResearchRubrics
Performance49.36
18
Long-form researchResearchRubrics
Score61.5
16
Research AutomationRESEARCHRUBRICS
Score63.69
5
Deep Research EvaluationResearchRubrics
WQ Score66.6
3
Showing 6 of 6 rows