ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees
About
Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the closed-source nature of the latest large language models (LLMs). This study investigates applying conformal prediction (CP), which can transform any heuristic uncertainty notion into rigorous prediction sets, to black-box LLMs in open-ended NLG tasks. We introduce a novel uncertainty measure based on self-consistency theory, and then develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the CP algorithm. Empirical evaluations indicate that our uncertainty measure outperforms prior state-of-the-art methods. Furthermore, we achieve strict control over the correctness coverage rate utilizing 7 popular LLMs on 4 free-form NLG datasets, spanning general-purpose and medical scenarios. Additionally, the calibrated prediction sets with small size further highlights the efficiency of our method in providing trustworthy guarantees for practical open-ended NLG applications.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Question Answering | NQ (test) | AUROC78.8 | 90 | |
| Question Answering | WQ (test) | AUROC72.2 | 90 | |
| Question Answering | NQ | Absolute Execution Time Overhead (s)713.2 | 90 | |
| Question Answering | WQ | Absolute Execution Time Overhead (s)618.6 | 90 | |
| Inference Efficiency | Natural Questions (NQ) | Relative Overhead (%)204.2 | 90 | |
| Question Answering | TQA | Absolute Execution Time Overhead (s)738.9 | 90 | |
| Question Answering | NQ | PRR0.499 | 90 | |
| Question Answering | WQ | PRR43.4 | 90 | |
| Question Answering | TQA (test) | AUROC77.1 | 90 | |
| Question Answering | TQA | PRR51.2 | 90 |