Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RESEARCHRUBRICS

Benchmarks

Task NameDataset NameSOTA ResultTrend
Research EvaluationResearchRubrics
Accuracy49.74
19
ResearchRubricsResearchRubrics
Accuracy50.97
19
Long-horizon agentic taskResearchRubrics
Performance49.36
18
Research AutomationRESEARCHRUBRICS
Score63.69
5
Deep Research EvaluationResearchRubrics
WQ Score66.6
3
Showing 5 of 5 rows