Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

An Empirical Analysis of Static Analysis Methods for Detection and Mitigation of Code Library Hallucinations

About

Despite extensive research, Large Language Models continue to hallucinate when generating code, particularly when using libraries. On NL-to-code benchmarks that require library use, we find that LLMs generate code that uses non-existent library features in 8.1-40% of responses. One intuitive approach for detection and mitigation of hallucinations is static analysis. In this paper, we analyse the potential of static analysis tools, both in terms of what they can solve and what they cannot. We find that static analysis tools can detect 16-70% of all errors, and 14-85% of library hallucinations, with performance varying by LLM and dataset. Through manual analysis, we identify cases a static method could not plausibly catch, which gives an upper bound on their potential from 48.5% to 77%. Overall, we show that static analysis methods are cheap method for addressing some forms of hallucination, and we quantify how far short of solving the problem they will always be.

Clarissa Miranda-Pena, Andrew Reeson, C\'ecile Paris, Josiah Poon, Jonathan K. Kummerfeld• 2026

Related benchmarks

TaskDatasetResultRank
Code Hallucination DetectionBigCode
OP0.45
16
Code Hallucination DetectionDS-1000
OP0.57
16
Code Hallucination DetectionODEX
OP59
16
Showing 3 of 3 rows

Other info

Follow for update