Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AVG

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringAVG.
EM47.5
28
change detectionAvg across SYSU, LEVIR, GVLM, CLCD, OSCD
Precision84.8
23
Low-Light Image EnhancementAVG. DICM, MEF, LIME, NPE, VV
NIQE3.589
17
Reasoning Performance (Aggregate)AVG
TPF351
14
Question AnsweringAVG. Aggregate of NQ, TQA, HQA, 2WIKI (test)
EM42.5
14
General ReasoningAVG Reasoning Suite
Accuracy61.56
12
Selective classificationAvg (all)
AURC (10^-2 Scale)0.215
11
Selective classificationAvg 1K
AURC (Scale 10^-2)0.248
11
GeolocationAVG (test)
City Acc (25km)8.3
10
DetectionAVG
AUC0.897
10
Multi-task Language UnderstandingAVG Across All Benchmarks
Throughput12.89
8
Scene Text RecognitionAVG 12 benchmarks
Word Accuracy91.33
8
Preference AlignmentAvg.
GRA70.6
4
Machine TranslationAvg (News, Flores, Subtitle, Travel) German-English (test aggregate)
DA87.77
4
Showing 14 of 14 rows