| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Linguistic Minimal Pair Scoring | BLiMP | Overall Accuracy88.9 | 49 | |
| Linguistic Minimal Pair Evaluation | BLiMP (test) | NPI lic. (2)100 | 28 | |
| Linguistic Minimal Pairs | BLiMP 10% (test) | BLiMP 10% Accuracy76.1 | 11 | |
| Linguistic Probing | BLiMP | Performance64.8 | 10 | |
| Linguistic Acceptability | BLiMP English (all) | Accuracy81.6 | 9 | |
| Acceptability Judgment | BLiMP-NL Dutch (test) | AUC73 | 8 | |
| Syntax | BLiMP | Accuracy84.61 | 8 | |
| Audio Language Modeling | sBLIMP | Accuracy64.7 | 8 | |
| Zero-shot Language Modeling | BLiMP (test) | Accuracy79.6 | 8 | |
| Syntactic Generalization | BLiMP (test) | BLiMP Accuracy0.773 | 8 | |
| Grammaticality Judgment | BLiMP | BLiMP Grammaticality Score76.4 | 6 | |
| Linguistic Minimal Pairs Evaluation | BLiMP | Accuracy80.1 | 6 | |
| Semantic Anomaly Detection | BLIMP Animacy | Accuracy78.7 | 6 | |
| Morphosyntax Anomaly Detection | BLIMP Det-Noun | Accuracy99.9 | 6 | |
| Morphosyntax Anomaly Detection | BLIMP Subject-Verb | Accuracy97.1 | 6 | |
| Linguistic Analysis | BLiMP | Accuracy60.5 | 4 | |
| Syntactic Generalization | BLiMP 10% subset | Accuracy (10% BLiMP)83.2 | 3 | |
| Linguistic Minimal Pair Classification | BLiMP principle_A_c_command | Accuracy50.4 | 3 | |
| Linguistic Minimal Pair Classification | BLiMP coordinate_structure_constraint_complex_left_branch | Accuracy62.4 | 3 | |
| Linguistic Minimal Pair Classification | BLiMP left branch island echo question | Accuracy63.2 | 3 | |
| Linguistic Minimal Pair Classification | BLiMP sentential_subject_island | Accuracy52.2 | 3 | |
| Linguistic Minimal Pair Classification | BLiMP principle_A_reconstruction | Accuracy78.9 | 3 | |
| Linguistic Minimal Pair Classification | BLiMP matrix_question_npi_licensor_present | Accuracy44.2 | 3 | |
| Linguistic Minimal Pair Classification | BLiMP wh_vs_that_with_gap_long_distance | Accuracy41.2 | 3 | |
| Linguistic Minimal Pair Classification | BLiMP existential_there_quantifiers_2 | Accuracy51.8 | 3 |