Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees

About

Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the closed-source nature of the latest large language models (LLMs). This study investigates applying conformal prediction (CP), which can transform any heuristic uncertainty notion into rigorous prediction sets, to black-box LLMs in open-ended NLG tasks. We introduce a novel uncertainty measure based on self-consistency theory, and then develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the CP algorithm. Empirical evaluations indicate that our uncertainty measure outperforms prior state-of-the-art methods. Furthermore, we achieve strict control over the correctness coverage rate utilizing 7 popular LLMs on 4 free-form NLG datasets, spanning general-purpose and medical scenarios. Additionally, the calibrated prediction sets with small size further highlights the efficiency of our method in providing trustworthy guarantees for practical open-ended NLG applications.

Zhiyuan Wang, Jinhao Duan, Lu Cheng, Yue Zhang, Qingni Wang, Xiaoshuang Shi, Kaidi Xu, Hengtao Shen, Xiaofeng Zhu• 2024

Related benchmarks

TaskDatasetResultRank
Question AnsweringNQ (test)
AUROC78.8
90
Question AnsweringWQ (test)
AUROC72.2
90
Question AnsweringNQ
Absolute Execution Time Overhead (s)713.2
90
Question AnsweringWQ
Absolute Execution Time Overhead (s)618.6
90
Inference EfficiencyNatural Questions (NQ)
Relative Overhead (%)204.2
90
Question AnsweringTQA
Absolute Execution Time Overhead (s)738.9
90
Question AnsweringNQ
PRR0.499
90
Question AnsweringWQ
PRR43.4
90
Question AnsweringTQA (test)
AUROC77.1
90
Question AnsweringTQA
PRR51.2
90
Showing 10 of 10 rows

Other info

Follow for update