Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Targeted Syntactic Evaluation of Language Models

About

We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM's accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model.

Rebecca Marvin, Tal Linzen• 2018

Related benchmarks

TaskDatasetResultRank
Acceptability JudgmentBLiMP-NL Dutch (test)
AUC63
8
Acceptability JudgmentItaCoLA Italian (test)
AUC0.61
8
Acceptability JudgmentJCoLA (Japanese) (test)
AUC0.59
8
Acceptability JudgmentSLING Chinese (test)
AUC58
8
Acceptability JudgmentScaLA sv Swedish (test)
AUC66
8
Acceptability JudgmentRuCoLA Russian (test)
AUC47
8
Showing 6 of 6 rows

Other info

Follow for update