The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination
About
Hallucination is a persistent challenge in large language models (LLMs), where even with rigorous quality control, models often generate distorted facts. This paradox, in which error generation continues despite high-quality training data, calls for a deeper understanding of the underlying LLM mechanisms. To address it, we propose a novel concept: knowledge overshadowing, where model's dominant knowledge can obscure less prominent knowledge during text generation, causing the model to fabricate inaccurate details. Building on this idea, we introduce a novel framework to quantify factual hallucinations by modeling knowledge overshadowing. Central to our approach is the log-linear law, which predicts that the rate of factual hallucination increases linearly with the logarithmic scale of (1) Knowledge Popularity, (2) Knowledge Length, and (3) Model Size. The law provides a means to preemptively quantify hallucinations, offering foresight into their occurrence even before model training or inference. Built on overshadowing effect, we propose a new decoding strategy CoDa, to mitigate hallucinations, which notably enhance model factuality on Overshadow (27.9%), MemoTrap (13.1%) and NQ-Swap (18.3%). Our findings not only deepen understandings of the underlying mechanisms behind hallucinations but also provide actionable insights for developing more predictable and controllable language models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Commonsense Reasoning | HellaSwag | Accuracy73.8 | 1460 | |
| Language Understanding | MMLU | Accuracy41.2 | 756 | |
| Question Answering | ARC Challenge | Accuracy56.5 | 749 | |
| Question Answering | TriviaQA | Accuracy56.3 | 85 | |
| Question Answering | Natural Questions | Accuracy24.3 | 21 | |
| Hallucination Prediction | Overshadowing | Accuracy (Time)65 | 16 | |
| Factuality | NQ-Swap | Science Category Score43.7 | 12 | |
| Knowledge retrieval | MemoTrap | Proverb Score42.5 | 12 | |
| Hallucination Prediction | MemoTrap | -- | 6 | |
| Hallucination Prediction | NQ-Swap | Accuracy (entity)29.4 | 4 |