Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DROP

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reading ComprehensionDROP
DROP Accuracy92.28
129
Reading ComprehensionDROP
F1 Score92.2
96
Reading ComprehensionDROP (test)
F1 Score96.42
76
Reading ComprehensionDROP (dev)
F1 Score88.1
63
Question AnsweringDROP
F1 Score87.5
45
GenerationDROP
F1 Score32.9
43
Natural Language ReasoningDROP
Accuracy89.62
43
ReasoningDROP
Score89.27
42
Reading ComprehensionDROP (test)
F1 Score76
29
Reading ComprehensionDROP
F1 Score69.18
25
Reading ComprehensionDROP
DROP Score48.68
25
Discrete ReasoningDROP
Exact Match (EM)71.59
25
Reading ComprehensionDROP (test)
Accuracy90.8
23
Question AnsweringDROP MRQA out-of-domain evaluation
EM64.9
23
Video ReconstructionDrop
PSNR35.03
21
Reading ComprehensionDROP
Loss0.4
20
Instruction-followingDROP
DROP Score51.53
20
Question AnsweringDROP nfl
F1 Score67.69
17
In-context retrievalDROP
Accuracy88.6
16
Multi-hop QADROP (test)
F1 Score87.9
14
Reading ComprehensionDROP MRQA out-of-domain
EM0.4884
14
Grayscale Video ReconstructionDrop
PSNR45.46
13
Question AnsweringDROP (test)
ROUGE76.78
12
Query-based Information ExtractionDROP
F1 Score64.64
12
Grounded Text GenerationDROP history
F151.17
11
Showing 25 of 65 rows