Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

How is BERT surprised? Layerwise detection of linguistic anomalies

About

Transformer language models have shown remarkable ability in detecting when a word is anomalous in context, but likelihood scores offer no information about the cause of the anomaly. In this work, we use Gaussian models for density estimation at intermediate layers of three language models (BERT, RoBERTa, and XLNet), and evaluate our method on BLiMP, a grammaticality judgement benchmark. In lower layers, surprisal is highly correlated to low token frequency, but this correlation diminishes in upper layers. Next, we gather datasets of morphosyntactic, semantic, and commonsense anomalies from psycholinguistic studies; we find that the best performing model RoBERTa exhibits surprisal in earlier layers when the anomaly is morphosyntactic than when it is semantic, while commonsense anomalies do not exhibit surprisal at any intermediate layer. These results suggest that language models employ separate mechanisms to detect different types of linguistic anomalies.

Bai Li, Zining Zhu, Guillaume Thomas, Yang Xu, Frank Rudzicz• 2021

Related benchmarks

TaskDatasetResultRank
Commonsense Anomaly DetectionWarren Commonsense
Accuracy75
6
Morphosyntax Anomaly DetectionBLIMP Subject-Verb
Accuracy97.1
6
Morphosyntax Anomaly DetectionOsterhout and Nicol Morphosyntax
Accuracy100
6
Semantic Anomaly DetectionBLIMP Animacy
Accuracy76.7
6
Morphosyntax Anomaly DetectionBLIMP Det-Noun
Accuracy98.3
6
Semantic Anomaly DetectionPylkkänen and McElree Semantic
Accuracy93.2
6
Semantic Anomaly DetectionWarren Semantic
Accuracy0.944
6
Semantic Anomaly DetectionOsterhout and Nicol Semantic
Accuracy84.1
6
Semantic Anomaly DetectionOsterhout and Mobley Semantic
Accuracy90.6
6
Commonsense Anomaly DetectionFedermeier and Kutas Commonsense
Accuracy62.5
6
Showing 10 of 12 rows

Other info

Code

Follow for update