Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification

About

Large language models have shown impressive capabilities in code generation, yet they often produce functionally incorrect code. Uncertainty quantification (UQ) methods have emerged as a promising approach for detecting hallucinations in natural language generation, but their effectiveness for code generation tasks remains underexplored. We systematically evaluate how UQ techniques transfer to code generation across three programming languages, five LLMs, and over 1,700 problems. We find that some token-probability-based methods generalize effectively without modification, while sampling-based methods relying on natural language inference (NLI) fail because NLI models cannot distinguish functionally different code, causing most responses to collapse into a single semantic cluster. To address this, we introduce functional equivalence methods, a family of code-specific methods that replace NLI-based semantic equivalence with an LLM-based functional equivalence assessment, including functional entropy, a code-specific analog of semantic entropy. Functional equivalence methods achieve top AUROC in 11 out of 15 model-benchmark combinations and the best calibration across most settings, consistently outperforming both NLI-based counterparts and all other methods evaluated.

Dylan Bouchard, Mohit Singh Chauhan, Zeya Ahmad, Ho-Kyeong Ra• 2026

Related benchmarks

Task	Dataset	Result
Code Correctness Prediction	LiveCodeBench Python	AUROC86.7	60
Code Correctness Prediction	LiveCodeBench Python	Brier Score0.073	60
Code Correctness Prediction	MultiPL-E Java	AUROC0.701	60
Predicting code correctness	LiveCodeBench Python	ECE0.024	60
Code Correctness Prediction	MultiPL-E Java	Brier Score0.243	60
Code Correctness Prediction	MultiPL-E Java	ECE0.155	60
Code correctness classification	LiveSQLBench SQLite	AUROC0.842	55
Predicting code correctness	LiveSQLBench SQLite	Brier Score0.129	55

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord