| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reading Comprehension | DROP | DROP Accuracy88.8 | 103 | |
| Reading Comprehension | DROP (dev) | F1 Score88.1 | 63 | |
| Reading Comprehension | DROP (test) | F1 Score96.42 | 61 | |
| Reading Comprehension | DROP | F1 Score92.2 | 55 | |
| Natural Language Reasoning | DROP | Accuracy88.9 | 33 | |
| Reading Comprehension | DROP (test) | F1 Score76 | 29 | |
| Generation | DROP | F1 Score32.9 | 29 | |
| Reasoning | DROP | Score88.32 | 21 | |
| Video Reconstruction | Drop | PSNR35.03 | 21 | |
| Discrete Reasoning | DROP | Exact Match (EM)71.59 | 19 | |
| Question Answering | DROP nfl | F1 Score67.69 | 17 | |
| Question Answering | DROP MRQA out-of-domain evaluation | EM64.9 | 15 | |
| Reading Comprehension | DROP (test) | Accuracy90.8 | 14 | |
| Reading Comprehension | DROP MRQA out-of-domain | EM0.4884 | 14 | |
| Grayscale Video Reconstruction | Drop | PSNR45.46 | 13 | |
| Question Answering | DROP (test) | ROUGE76.78 | 12 | |
| Query-based Information Extraction | DROP | F1 Score64.64 | 12 | |
| Question Answering | DROP | F1 Score87.5 | 11 | |
| Grounded Text Generation | DROP history | F151.17 | 11 | |
| Reading Comprehension | DROP | EM41.7 | 11 | |
| Reading Comprehension | DROP 1.0 (test) | EM92.38 | 11 | |
| Video Snapshot Compressive Imaging | Drop Grayscale | PSNR47.05 | 10 | |
| Question Answering | DROP (dev) | EM84.1 | 10 | |
| Question Answering | DROP (val) | F1 Score80 | 10 | |
| Reading Comprehension | DROP Single-Span questions (dev) | EM84.2 | 10 |