Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Deep Research Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Deep Research EvaluationDeep Research Bench first training epoch (step 600)
Readability52.09
17
Deep Research EvaluationDeep Research Bench (step 1100)
Readability53.81
16
Showing 2 of 2 rows