Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering

About

Training LLMs on data containing unfamiliar knowledge during the instruction tuning stage can encourage hallucinations. To address this challenge, we introduce NOVA, a novel framework designed to identify high-quality data that aligns well with the LLM's learned knowledge to reduce hallucinations. NOVA includes Internal Consistency Probing (ICP) and Semantic Equivalence Identification (SEI) to measure how familiar the LLM is with instruction data. Specifically, ICP evaluates the LLM's understanding of the given instruction by calculating the tailored consistency among multiple self-generated responses. SEI further assesses the familiarity of the LLM with the target response by comparing it to the generated responses, using the proposed semantic clustering and well-designed voting strategy. Finally, to ensure the quality of selected samples, we introduce an expert-aligned reward model, considering characteristics beyond just familiarity. By considering data quality and avoiding unfamiliar data, we can utilize the selected data to effectively align LLMs to follow instructions and hallucinate less.

Shuzheng Si, Haozhe Zhao, Gang Chen, Cheng Gao, Yuzhuo Bai, Zhitong Wang, Kaikai An, Kangyang Luo, Chen Qian, Fanchao Qi, Baobao Chang, Maosong Sun• 2025

Related benchmarks

Task	Dataset	Result
Instruction Following	MT-Bench	MT-Bench Score6.46	287
Faithfulness Hallucination	FollowRAG Faithfulness+	Faithfulness (NaturalQA)49.5	60
Instruction Following	MT-bench v1.0 (test)	MT-Bench Score60.8	52
General Capability Evaluation	General Capability Suite MMLU, GSM8K, HumanEval, IFEval	Common Average Score77.78	39
Factuality Hallucination	BioGEN	FactScore49.1	30
Factuality Hallucination Evaluation	BioGEN (test)	FactScore50.5	30
Factuality Hallucination	LongFact	Facts Score21.5	30
Factuality Hallucination Evaluation	LongFact (test)	Response Score100	30
Instruction Following	FollowRAG Instruction v1 (test)	FollowRAG Instruction Score40.1	30
Instruction Following	FollowRAG Instruction	FollowRAG Instruction Score40.1	30

Showing 10 of 11 rows

Other info

Code

Follow for update

@wizwand_team Discord