Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation

About

Although the problem of hallucinations in neural machine translation (NMT) has received some attention, research on this highly pathological phenomenon lacks solid ground. Previous work has been limited in several ways: it often resorts to artificial settings where the problem is amplified, it disregards some (common) types of hallucinations, and it does not validate adequacy of detection heuristics. In this paper, we set foundations for the study of NMT hallucinations. First, we work in a natural setting, i.e., in-domain data without artificial noise neither in training nor in inference. Next, we annotate a dataset of over 3.4k sentences indicating different kinds of critical errors and hallucinations. Then, we turn to detection methods and both revisit methods used previously and propose using glass-box uncertainty-based detectors. Overall, we show that for preventive settings, (i) previously used methods are largely inadequate, (ii) sequence log-probability works best and performs on par with reference-based methods. Finally, we propose DeHallucinator, a simple method for alleviating hallucinations at test time that significantly reduces the hallucinatory rate. To ease future research, we release our annotated dataset for WMT18 German-English data, along with the model, training data, and code.

Nuno M. Guerreiro, Elena Voita, Andr\'e F.T. Martins• 2022

Related benchmarks

TaskDatasetResultRank
Multi-hop Question AnsweringHotpotQA
F1 Score34.1
221
Question AnsweringPubMedQA (test)
Accuracy45.1
81
Question AnsweringNQ, TriviaQA, and WebQ (test)
Accuracy53.6
21
Hallucination DetectionCDM (test)
F1 Score64
16
Hallucination DetectionCF (test)
F1 Score81
10
Hallucination DetectionE2E (test)
F1-R72
10
Sentence-Level Confidence PredictionSciFact
AUROC0.505
10
Sentence-Level Confidence PredictionFEVER
AUROC0.438
10
Sentence-Level Confidence PredictionWikiBio
AUROC48.5
10
Hallucination DetectionGerman-English MT Fully detached hallucinations
AUC0.935
8
Showing 10 of 18 rows

Other info

Follow for update