| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Linguistic Minimal Pair Scoring | BLiMP | Overall Accuracy88.9 | 49 | |
| Linguistic Minimal Pair Evaluation | BLiMP (test) | NPI lic. (2)100 | 28 | |
| Linguistic Probing | BLiMP | Performance64.8 | 10 | |
| Linguistic Acceptability | BLiMP English (all) | Accuracy81.6 | 9 | |
| Syntax | BLiMP | Accuracy84.61 | 8 | |
| Audio Language Modeling | sBLIMP | Accuracy64.7 | 8 | |
| Zero-shot Language Modeling | BLiMP (test) | Accuracy79.6 | 8 | |
| Syntactic Generalization | BLiMP (test) | BLiMP Accuracy0.773 | 8 | |
| Semantic Anomaly Detection | BLIMP Animacy | Accuracy78.7 | 6 | |
| Morphosyntax Anomaly Detection | BLIMP Det-Noun | Accuracy99.9 | 6 | |
| Morphosyntax Anomaly Detection | BLIMP Subject-Verb | Accuracy97.1 | 6 | |
| Linguistic Analysis | BLiMP | Accuracy60.5 | 4 | |
| Linguistic Competence Classification | BLiMP | Accuracy83 | 3 |