Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DRBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Domain ReasoningDRBench (test)
Score42.9
14
Large Vision-Language Model EvaluationDRBench BS
MCQ Score29.68
14
Large Vision-Language Model EvaluationDRBench S Subset
MCQ Accuracy47.22
14
Large Vision-Language Model EvaluationDRBench B
MCQ Score27.04
14
Agentic TaskDRBench
Score43
10
Citation URL Validity AnalysisDRBench
Non-resolving Rate5.4
10
Showing 6 of 6 rows