Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BLiMP

Benchmarks

Task NameDataset NameSOTA ResultTrend
Linguistic Minimal Pair ScoringBLiMP
Overall Accuracy88.9
49
Linguistic Minimal Pair EvaluationBLiMP (test)
NPI lic. (2)100
28
Linguistic Minimal PairsBLiMP 10% (test)
BLiMP 10% Accuracy76.1
11
Linguistic ProbingBLiMP
Performance64.8
10
Linguistic AcceptabilityBLiMP English (all)
Accuracy81.6
9
Acceptability JudgmentBLiMP-NL Dutch (test)
AUC73
8
SyntaxBLiMP
Accuracy84.61
8
Audio Language ModelingsBLIMP
Accuracy64.7
8
Zero-shot Language ModelingBLiMP (test)
Accuracy79.6
8
Syntactic GeneralizationBLiMP (test)
BLiMP Accuracy0.773
8
Grammaticality JudgmentBLiMP
BLiMP Grammaticality Score76.4
6
Linguistic Minimal Pairs EvaluationBLiMP
Accuracy80.1
6
Semantic Anomaly DetectionBLIMP Animacy
Accuracy78.7
6
Morphosyntax Anomaly DetectionBLIMP Det-Noun
Accuracy99.9
6
Morphosyntax Anomaly DetectionBLIMP Subject-Verb
Accuracy97.1
6
Linguistic AnalysisBLiMP
Accuracy60.5
4
Syntactic GeneralizationBLiMP 10% subset
Accuracy (10% BLiMP)83.2
3
Linguistic Minimal Pair ClassificationBLiMP principle_A_c_command
Accuracy50.4
3
Linguistic Minimal Pair ClassificationBLiMP coordinate_structure_constraint_complex_left_branch
Accuracy62.4
3
Linguistic Minimal Pair ClassificationBLiMP left branch island echo question
Accuracy63.2
3
Linguistic Minimal Pair ClassificationBLiMP sentential_subject_island
Accuracy52.2
3
Linguistic Minimal Pair ClassificationBLiMP principle_A_reconstruction
Accuracy78.9
3
Linguistic Minimal Pair ClassificationBLiMP matrix_question_npi_licensor_present
Accuracy44.2
3
Linguistic Minimal Pair ClassificationBLiMP wh_vs_that_with_gap_long_distance
Accuracy41.2
3
Linguistic Minimal Pair ClassificationBLiMP existential_there_quantifiers_2
Accuracy51.8
3
Showing 25 of 37 rows