Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models

About

Generation of plausible but incorrect factual information, often termed hallucination, has attracted significant research interest. Retrieval-augmented language model (RALM) -- which enhances models with up-to-date knowledge -- emerges as a promising method to reduce hallucination. However, existing RALMs may instead exacerbate hallucination when retrieving lengthy contexts. To address this challenge, we propose COFT, a novel \textbf{CO}arse-to-\textbf{F}ine highligh\textbf{T}ing method to focus on different granularity-level key texts, thereby avoiding getting lost in lengthy contexts. Specifically, COFT consists of three components: \textit{recaller}, \textit{scorer}, and \textit{selector}. First, \textit{recaller} applies a knowledge graph to extract potential key entities in a given context. Second, \textit{scorer} measures the importance of each entity by calculating its contextual weight. Finally, \textit{selector} selects high contextual weight entities with a dynamic threshold algorithm and highlights the corresponding paragraphs, sentences, or words in a coarse-to-fine manner. Extensive experiments on the knowledge hallucination benchmark demonstrate the effectiveness of COFT, leading to a superior performance over $30\%$ in the F1 score metric. Moreover, COFT also exhibits remarkable versatility across various long-form tasks, such as reading comprehension and question answering.

Qitan Lv, Jie Wang, Hanzhu Chen, Bin Li, Yongdong Zhang, Feng Wu• 2024

Related benchmarks

TaskDatasetResultRank
Question Answering2Wiki
EM45.5
241
Multi-hop Question AnsweringHotpotQA
LLM Judge Score61.71
72
Question AnsweringBamboogle
EM40.3
61
Question AnsweringMuSiQue
EM18.5
38
Multi-hop Question Answering2Wiki
EM41.86
16
Multi-hop Question AnsweringBamboogle
EM35.71
16
Multi-hop Question AnsweringMuSiQue
EM17.12
16
Multi-question ReasoningMuSiQue-3Q
Exact Match (EM)12.6
6
Multi-question ReasoningHotpotQA 3Q
Exact Match Accuracy (3Q)24.5
6
Multi-question Reasoning2Wiki-3Q
Exact Match (EM)27.2
6
Showing 10 of 15 rows

Other info

Follow for update