Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BLiMP

Benchmarks

Task NameDataset NameSOTA ResultTrend
Linguistic Minimal Pair ScoringBLiMP
Overall Accuracy88.9
49
Linguistic Minimal Pair EvaluationBLiMP (test)
NPI lic. (2)100
28
Linguistic ProbingBLiMP
Performance64.8
10
Linguistic AcceptabilityBLiMP English (all)
Accuracy81.6
9
SyntaxBLiMP
Accuracy84.61
8
Audio Language ModelingsBLIMP
Accuracy64.7
8
Zero-shot Language ModelingBLiMP (test)
Accuracy79.6
8
Syntactic GeneralizationBLiMP (test)
BLiMP Accuracy0.773
8
Semantic Anomaly DetectionBLIMP Animacy
Accuracy78.7
6
Morphosyntax Anomaly DetectionBLIMP Det-Noun
Accuracy99.9
6
Morphosyntax Anomaly DetectionBLIMP Subject-Verb
Accuracy97.1
6
Linguistic AnalysisBLiMP
Accuracy60.5
4
Computation Time AnalysisBlimp Circle C trajectory
Computation Time (ms)0.96
3
Computation Time AnalysisBlimp Helix B trajectory
Computation Time (ms)0.95
3
Computation Time AnalysisBlimp Helix A trajectory
Computation Time (ms)0.94
3
Computation Time AnalysisBlimp Lemniscate C trajectory
Computation Time (ms)0.86
3
Computation Time AnalysisBlimp Lemniscate B trajectory
Computation Time (ms)0.87
3
Computation Time AnalysisBlimp Lemniscate A trajectory
Computation Time (ms)0.82
3
Computation Time AnalysisBlimp Circle B trajectory
Computation Time (ms)0.82
3
Computation Time AnalysisBlimp Circle A trajectory
Computation Time (ms)0.84
3
Trajectory TrackingBlimp (experimental trials)
Avg CPU Energy Expenditure (µJ)1.25
3
Linguistic Competence ClassificationBLiMP
Accuracy83
3
Showing 22 of 22 rows