An Empirical Analysis of Static Analysis Methods for Detection and Mitigation of Code Library Hallucinations

About

Despite extensive research, Large Language Models continue to hallucinate when generating code, particularly when using libraries. On NL-to-code benchmarks that require library use, we find that LLMs generate code that uses non-existent library features in 8.1-40% of responses. One intuitive approach for detection and mitigation of hallucinations is static analysis. In this paper, we analyse the potential of static analysis tools, both in terms of what they can solve and what they cannot. We find that static analysis tools can detect 16-70% of all errors, and 14-85% of library hallucinations, with performance varying by LLM and dataset. Through manual analysis, we identify cases a static method could not plausibly catch, which gives an upper bound on their potential from 48.5% to 77%. Overall, we show that static analysis methods are cheap method for addressing some forms of hallucination, and we quantify how far short of solving the problem they will always be.

Clarissa Miranda-Pena, Andrew Reeson, C\'ecile Paris, Josiah Poon, Jonathan K. Kummerfeld• 2026

Related benchmarks

Task	Dataset	Result
Code Hallucination Detection	BigCode	OP0.45	16
Code Hallucination Detection	DS-1000	OP0.57	16
Code Hallucination Detection	ODEX	OP59	16

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord