Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DROP

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reading ComprehensionDROP
DROP Accuracy88.8
111
Reading ComprehensionDROP
F1 Score92.2
73
Reading ComprehensionDROP (dev)
F1 Score88.1
63
Reading ComprehensionDROP (test)
F1 Score96.42
61
Natural Language ReasoningDROP
Accuracy88.9
33
Reading ComprehensionDROP (test)
F1 Score76
29
GenerationDROP
F1 Score32.9
29
ReasoningDROP
Score88.32
27
Question AnsweringDROP MRQA out-of-domain evaluation
EM64.9
23
Video ReconstructionDrop
PSNR35.03
21
Discrete ReasoningDROP
Exact Match (EM)71.59
19
Question AnsweringDROP nfl
F1 Score67.69
17
Reading ComprehensionDROP
DROP Score36.4
16
In-context retrievalDROP
Accuracy88.6
16
Multi-hop QADROP (test)
F1 Score87.9
14
Reading ComprehensionDROP (test)
Accuracy90.8
14
Reading ComprehensionDROP MRQA out-of-domain
EM0.4884
14
Grayscale Video ReconstructionDrop
PSNR45.46
13
Question AnsweringDROP (test)
ROUGE76.78
12
Query-based Information ExtractionDROP
F1 Score64.64
12
Question AnsweringDROP
F1 Score87.5
11
Grounded Text GenerationDROP history
F151.17
11
Reading ComprehensionDROP
EM41.7
11
Reading ComprehensionDROP 1.0 (test)
EM92.38
11
Video Snapshot Compressive ImagingDrop Grayscale
PSNR47.05
10
Showing 25 of 52 rows