FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation
About
Large language models (LLMs) augmented with retrieval systems have demonstrated significant potential in handling knowledge-intensive tasks. However, these models often struggle with unfaithfulness issues, generating outputs that either ignore the retrieved context or inconsistently blend it with the LLM`s parametric knowledge. This issue is particularly severe in cases of knowledge conflict, where the retrieved context conflicts with the model`s parametric knowledge. While existing faithful RAG approaches enforce strict context adherence through well-designed prompts or modified decoding strategies, our analysis reveals a critical limitation: they achieve faithfulness by forcibly suppressing the model`s parametric knowledge, which undermines the model`s internal knowledge structure and increases the risk of misinterpreting the context. To this end, this paper proposes FaithfulRAG, a novel framework that resolves knowledge conflicts by explicitly modeling discrepancies between the model`s parametric knowledge and retrieved context. Specifically, FaithfulRAG identifies conflicting knowledge at the fact level and designs a self-thinking process, allowing LLMs to reason about and integrate conflicting facts before generating responses. Extensive experiments demonstrate that our method outperforms state-of-the-art methods. The code is available at https://github.com/DeepLearnXMU/Faithful-RAG
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Question Answering | MuSiQue | Accuracy (ACC)79.9 | 36 | |
| Question Answering | FaithEval | Accuracy81.7 | 27 | |
| Question Answering | SQuAD | Accuracy (ACC)86.3 | 27 | |
| Question Answering | RealtimeQA | Accuracy84.1 | 27 | |
| Question Answering | MuSiQue entity-level knowledge conflict (test) | Mean Rank7.7 | 24 | |
| Question Answering | SQuAD entity-level knowledge conflict (test) | MR9.7 | 24 | |
| Question Answering | MuSiQue | LLM Accuracy52.9 | 20 | |
| Question Answering | HotpotQA | LLM Accuracy76.9 | 20 | |
| Long-form Question Answering | GraphRAG-Bench Med | LLM Accuracy75.4 | 20 | |
| Long-form Question Answering | Novel GraphRAG-Bench | LLM-Acc60.7 | 20 |