Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs

About

Accurate uncertainty quantification in large language models (LLMs) is essential for reliable confidence estimation, yet fine-tuned LLMs often become overconfident under limited adaptation data. Existing uncertainty methods for PEFT-based LLMs are largely post hoc, estimating uncertainty after fine-tuning rather than improving how adapters specialize to task-specific input-output relationships. We propose Functional-Level Uncertainty Quantification for Calibrated Fine-Tuning (UQ4CT), which calibrates uncertainty over the functional space induced by prompt-dependent mixtures of LoRA experts. UQ4CT implements this perspective through a mixture-of-experts fine-tuning framework, where a calibration loss aligns functional-level confidence with predictive correctness during training. Across four multiple-choice benchmarks and two open-ended generative QA tasks, UQ4CT reduces Expected Calibration Error (ECE) by over $25\%$ while preserving high accuracy. Under distribution shift, UQ4CT maintains superior calibration and competitive accuracy, demonstrating improved reliability and generalization for fine-tuned LLMs.

Ruijia Niu, Dongxia Wu, Rose Yu, Yi-An Ma• 2024

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningARC-C
Accuracy79.601
215
Commonsense ReasoningOBQA
Accuracy88.4
187
Commonsense ReasoningARC-E
Accuracy88.66
152
Open-ended generationTriviaQA
ECE6.63
37
Multiple-choice Question AnsweringARC-C
Accuracy79
28
Multiple-choice Question AnsweringARC-E
Accuracy87.8
16
Domain-specific ReasoningClimateQA
Accuracy (ACC)79.97
9
Multiple-choice Question AnsweringOBQA
Accuracy0.884
8
Multiple-choice Question AnsweringENG
Accuracy61.13
8
Multiple-choice Question AnsweringLaw
Accuracy45.4
8
Showing 10 of 13 rows

Other info

Follow for update