Lyapunov Probes for Hallucination Detection in Large Foundation Models

About

We address hallucination detection in Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) by framing the problem through the lens of dynamical systems stability theory. Rather than treating hallucination as a straightforward classification task, we conceptualize (M)LLMs as dynamical systems, where factual knowledge is represented by stable equilibrium points within the representation space. Our main insight is that hallucinations tend to arise at the boundaries of knowledge-transition regions separating stable and unstable zones. To capture this phenomenon, we propose Lyapunov Probes: lightweight networks trained with derivative-based stability constraints that enforce a monotonic decay in confidence under input perturbations. By performing systematic perturbation analysis and applying a two-stage training process, these probes reliably distinguish between stable factual regions and unstable, hallucination-prone regions. Experiments on diverse datasets and models demonstrate consistent improvements over existing baselines.

Bozhi Luan, Gen Li, Yalan Qin, Jifeng Guo, Yun Zhou, Faguo Wu, Hongwei Zheng, Wenjun Wu, Zhaoxin Fan• 2026

Related benchmarks

Task	Dataset	Result
Hallucination Detection	TriviaQA	--	621
Hallucination Detection	MMLU	AUPRC87.48	62
Hallucination Detection	POPE official (val)	A-PR99.13	34
Hallucination Detection	PopQA	AUPRC67.08	20
Hallucination Detection	CoQA	AUPRC89.01	20
Hallucination Detection	TextVQA (val)	AUPRC96.98	4
Hallucination Detection	Vizwiz (val)	AUPRC85.17	4
Hallucination Detection	MME (val)	AUPRC97.57	4

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord