FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation
About
Large language models (LLMs) augmented with retrieval systems have demonstrated significant potential in handling knowledge-intensive tasks. However, these models often struggle with unfaithfulness issues, generating outputs that either ignore the retrieved context or inconsistently blend it with the LLM`s parametric knowledge. This issue is particularly severe in cases of knowledge conflict, where the retrieved context conflicts with the model`s parametric knowledge. While existing faithful RAG approaches enforce strict context adherence through well-designed prompts or modified decoding strategies, our analysis reveals a critical limitation: they achieve faithfulness by forcibly suppressing the model`s parametric knowledge, which undermines the model`s internal knowledge structure and increases the risk of misinterpreting the context. To this end, this paper proposes FaithfulRAG, a novel framework that resolves knowledge conflicts by explicitly modeling discrepancies between the model`s parametric knowledge and retrieved context. Specifically, FaithfulRAG identifies conflicting knowledge at the fact level and designs a self-thinking process, allowing LLMs to reason about and integrate conflicting facts before generating responses. Extensive experiments demonstrate that our method outperforms state-of-the-art methods. The code is available at https://github.com/DeepLearnXMU/Faithful-RAG
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Faithfulness Evaluation | FaithEval | F1 Score70.2 | 42 | |
| Multiple-choice Question Answering | ConFiQA MC | F1 Score72.3 | 42 | |
| Question Answering | MuSiQue | Accuracy (ACC)79.9 | 36 | |
| Question Answering | SQuAD KRE-curated version | F1 Score66.2 | 36 | |
| Open-ended Question Answering | ConFiQA (test) | F1 Score75.3 | 36 | |
| Multi-step Reasoning Question Answering | ConFiQA MR (test) | F1 Score62.4 | 36 | |
| Question Answering | MuSiQue | LLM Accuracy52.9 | 34 | |
| Question Answering | FaithEval | Accuracy81.7 | 27 | |
| Question Answering | SQuAD | Accuracy (ACC)86.3 | 27 | |
| Question Answering | RealtimeQA | Accuracy84.1 | 27 |