| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | AdversarialQA (val) | EM38.5 | 19 | |
| Question Answering | AdversarialQA | F1 Score56.1 | 17 | |
| Question Answering | AdversarialQA dBERT | Accuracy39.51 | 14 | |
| Question Answering | AdversarialQA dRoberta | Accuracy28.05 | 10 | |
| Safety Evaluation | ADVERSARIALQA | Chinese Accuracy43.75 | 8 | |
| Domain Shift Extractive Question Answering | SQuAD -> AdversarialQA (test) | ECE0.075 | 6 | |
| Question Answering | AdversarialQA dBiDAF | Accuracy55.12 | 6 |