| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Coreference Resolution | WSC | Accuracy98.5 | 116 | |
| Classification | WSC | Accuracy80 | 59 | |
| Coreference Resolution | WSC | Accuracy@185.2 | 33 | |
| Language | WSC | Score86.51 | 30 | |
| Coreference Resolution | WSC | Loss0.02 | 20 | |
| Coreference Resolution | WSC (test) | Accuracy82.7 | 19 | |
| Pronoun Disambiguation | WSC (test) | Accuracy (Single)78.8 | 14 | |
| Coreference Resolution | WSC | Accuracy65.4 | 13 | |
| Commonsense Reasoning | WSC | Accuracy80.6 | 12 | |
| Reasoning | WSC Ambiguity-Augmented (200 samples) | Accuracy@185.2 | 11 | |
| Coreference Resolution | WSC ambiguity-augmented | Accuracy82.6 | 11 | |
| Winograd Schema Challenge | WSC | Accuracy43.3 | 8 | |
| Coreference Resolution | WSC SuperGLUE (test) | Accuracy (Test)65.65 | 8 | |
| Coreference Resolution | WSC standard (test) | Accuracy56.7 | 8 | |
| Sleep staging | WSC | AUC98.1 | 7 | |
| Coreference Resolution | WSC | F1 Score58.36 | 7 | |
| Coreference Resolution | WSC | Accuracy (0-shot)75.7 | 6 | |
| Coreference Resolution | WSC (dev) | Accuracy85.6 | 6 | |
| Coreference Resolution | WSC273 | Accuracy82.8 | 5 |