| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Classification | MultiRC | Accuracy71.1 | 29 | |
| Multi-Sentence Reading Comprehension | MultiRC | F182.72 | 16 | |
| Explanation Evaluation | MultiRC (test) | Sufficiency13.19 | 16 | |
| Question Answering | MultiRC | F1 Score87.82 | 14 | |
| Reading Comprehension | MultiRC | F1 Score88.2 | 13 | |
| Machine Reading Comprehension | MultiRC (dev) | F1 Score77.5 | 10 | |
| Reading Comprehension | MultiRC | Total Communication Time13,150 | 9 | |
| Reading Comprehension | MultiRC | MultiRC Accuracy72.9 | 9 | |
| Reading Comprehension | MultiRC | Accuracy (0-shot)10.3 | 6 | |
| Classification | MultiRC Dir alpha=0.1 | Generalized Accuracy72.53 | 5 | |
| Reading Comprehension | MultiRC Dir alpha=0.1 Standard | Personalized Accuracy (Acc_p)75.21 | 5 | |
| Machine Reading Comprehension | MultiRC SuperGLUE (test) | EM27.2 | 5 | |
| Text Classification | MultiRC ERASER (test) | Weighted Avg F1 (MultiRC)67 | 5 | |
| Question Answering | MultiRC SuperGLUE (dev) | Accuracy68.67 | 4 |