Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

UAT-LITE: Inference-Time Uncertainty-Aware Attention for Pretrained Transformers

About

Neural NLP models are often miscalibrated, assigning high confidence to incorrect predictions, which undermines selective prediction and high-stakes deployment. Post-hoc calibration methods adjust output probabilities but leave internal computation unchanged, while ensemble and Bayesian approaches improve uncertainty at substantial training or storage cost. We propose UAT-LITE, an inference-time framework that makes self-attention uncertainty-aware using approximate Bayesian inference via Monte Carlo dropout in pretrained transformer classifiers. Token-level epistemic uncertainty is estimated from stochastic forward passes and used to modulate self-attention during contextualization, without modifying pretrained weights or training objectives. We additionally introduce a layerwise variance decomposition to diagnose how predictive uncertainty accumulates across transformer depth. Across the SQuAD 2.0 answerability, MNLI, and SST-2, UAT-LITE reduces Expected Calibration Error by approximately 20% on average relative to a fine-tuned BERT-base baseline while preserving task accuracy, and improves selective prediction and robustness under distribution shift.

Elias Hossain, Shubhashis Roy Dipta, Subash Neupane, Rajib Rana, Ravid Shwartz-Ziv, Ivan Garibay, Niloofar Yousefi• 2026

Related benchmarks

TaskDatasetResultRank
Question AnsweringMedQA
Accuracy24.9
70
Natural Language InferenceMNLI (val)--
26
Question AnsweringSQuAD v2.0 (val)--
21
Question AnsweringPubMedQA
Accuracy64
9
Distribution Shift RobustnessMNLI matched → mismatched
ID ECE0.0219
2
Selective PredictionMNLI
Coverage @ 0.986.92
2
Selective PredictionSQuAD 2.0
Coverage@0.969.04
2
Showing 7 of 7 rows

Other info

Follow for update