In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

About

Large language models (LLMs) frequently hallucinate and produce factual errors, yet our understanding of why they make these errors remains limited. In this study, we delve into the underlying mechanisms of LLM hallucinations from the perspective of inner representations, and discover a salient pattern associated with hallucinations: correct generations tend to have sharper context activations in the hidden states of the in-context tokens, compared to the incorrect ones. Leveraging this insight, we propose an entropy-based metric to quantify the ``sharpness'' among the in-context hidden states and incorporate it into the decoding process to formulate a constrained decoding approach. Experiments on various knowledge-seeking and hallucination benchmarks demonstrate our approach's consistent effectiveness, for example, achieving up to an 8.6 point improvement on TruthfulQA. We believe this study can improve our understanding of hallucinations and serve as a practical solution for hallucination mitigation.

Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He• 2024

Related benchmarks

Task	Dataset	Result
Text Generation	GSM8K	Accuracy53.39	63
Multiple-Choice Classification	MMLU	Accuracy60.69	47
Short Answer Question Answering	HotpotQA	F137	39
Open-ended generation	TriviaQA	--	37
Multiple-choice Question Answering	ARC-C	Accuracy48.93	22
Free-form text generation	CoQA	Accuracy63.4	22
Question Answering	CoQA	Factual Accuracy25.02	21
Question Answering	TYDIQAGP	Factual Accuracy27.12	21
Question Answering	TruthfulQA	Factual Accuracy43.53	21
Question Answering	TriviaQA	Factual Accuracy44.17	21

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord