Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

all

Benchmarks

Task NameDataset NameSOTA ResultTrend
Intent DetectionAll Posts and Comment Mean
Mean Score68.51
42
Legal Contract RevisionALL Avg
CQ Score86.87
25
Mathematical ReasoningAll Average
Accuracy60
20
Machine Translation (English to Hindi)All weighted average (test)
BLEU Score0.0675
14
Machine Translation (Hindi to English)All weighted average (test)
BLEU Score10.28
14
Point Cloud Quality AssessmentALL
PLCC0.913
12
Machine TranslationALL Average of two language pairs in four directions wmt22-comet-da
COMET85.8
12
Organ SegmentationAll 121 classes v1 (test)
DSC90.49
10
Generative SearchingAll-50K (test)
HR@18.8
9
Word Sense DisambiguationALL (test)
F1 Score82
8
Binary Graph ClassificationAll 169 Graphs (5-fold stratified CV)
Accuracy (Test)75.9
6
Word Sense LinkingALL FULL
Precision80.4
5
Video Action RecognitionAll (Avg.)
Base Score65.5
5
Tabular Data GenerationAll Default, Shoppers, Adult
Memory Ratio Improvement (%)13.47
4
Wide-angle portrait correctionall (test)
Line Accuracy66.784
4
Aggregate PerformanceAll Average
Accuracy40.3
3
Word Sense LinkingALL FULL (test)
Precision80.4
3
Anomaly DetectionAll MVTec-AD, VisA, MPDD, BTAD combined
I-AUROC95.4
2
Classification CalibrationAll 6 tabular
ΔNLL (%)0.49
1
Decision MakingAll Aggregated (UK-based participants)
Final Accuracy5.2
1
Showing 20 of 20 rows