Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation
About
Despite showing increasingly human-like abilities, large language models (LLMs) often struggle with factual inaccuracies, i.e. "hallucinations", even when they hold relevant knowledge. To address these hallucinations, current approaches typically necessitate high-quality human factuality annotations. In this work, we explore Self-Alignment for Factuality, where we leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality. Specifically, we incorporate Self-Eval, a self-evaluation component, to prompt an LLM to validate the factuality of its own generated responses solely based on its internal knowledge. Additionally, we design Self-Knowledge Tuning (SK-Tuning) to augment the LLM's self-evaluation ability by improving the model's confidence estimation and calibration. We then utilize these self-annotated responses to fine-tune the model via Direct Preference Optimization algorithm. We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks on TruthfulQA and BioGEN.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Instruction Following | MT-Bench | MT-Bench Score5.31 | 189 | |
| Faithfulness Hallucination | FollowRAG Faithfulness+ | Faithfulness (NaturalQA)43.5 | 60 | |
| Instruction Following | MT-bench v1.0 (test) | MT-Bench Score49.5 | 52 | |
| Factuality Hallucination Evaluation | BioGEN (test) | FactScore48.3 | 30 | |
| Factuality Hallucination Evaluation | LongFact (test) | Response Score100 | 30 | |
| Factuality Hallucination | BioGEN | FactScore46.8 | 30 | |
| Instruction Following | FollowRAG Instruction | FollowRAG Instruction Score38.5 | 30 | |
| Instruction Following | FollowRAG Instruction v1 (test) | FollowRAG Instruction Score38.1 | 30 | |
| Factuality Hallucination | LongFact | Facts Score15.7 | 30 | |
| Truthful and Informative Generation | TruthfulQA (test) | True*Info (%)61.88 | 12 |