Shared Doubt: Zero-shot Cross-Lingual Confidence Estimation for Language Models
About
Confidence estimation (CE), i.e. quantifying the reliability of a model's prediction, has attracted great interest in the context of large language models (LLMs). However, most studies focus on English, ignoring the multilingual reality of LLM usage, while many CE methods degrade or require retraining across languages. To address this gap, we investigate whether multilingual LLMs encode shared, language-transferable confidence features. We use a lightweight linear probe that predicts answer correctness directly from intermediate representations. Trained monolingually, the probe generalizes zero-shot to unseen, typologically diverse languages without target-language supervision. Learned layer weights and multiple ablations reveal that confidence features concentrate in middle layers across languages, suggesting a shared confidence subspace. While zero-shot cross-lingual performance depends on similarity to the source language, the probe provides a strong baseline without any retraining and compares favorably to other popular confidence estimation methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Confidence Estimation | MKQA (test) | AUROC0.82 | 14 | |
| Confidence Estimation | Global MMLU (test) | AUROC0.78 | 14 | |
| Confidence Estimation | MKQA Spanish es (test) | AUROC82 | 5 | |
| Confidence Estimation | Global-MMLU Spanish es (test) | AUROC74 | 5 | |
| Confidence Estimation | MKQA Russian / ru (test) | AUROC80 | 5 | |
| Confidence Estimation | Global-MMLU Russian (test) | AUROC72 | 5 | |
| Confidence Estimation | Global-MMLU Japanese ja (test) | AUROC72 | 5 | |
| Confidence Estimation | MKQA Japanese ja (test) | AUROC64 | 5 | |
| Confidence Estimation | MKQA French | AUROC0.82 | 2 | |
| Confidence Estimation | MKQA English | AUROC76 | 2 |