MultiHaluDet: Multilingual Hallucination Detection via LLM Hidden State Probing
About
Hallucinations in Large Language Models (LLMs) represent a critical barrier to their reliable deployment, a vulnerability heavily exacerbated in non-English and resource-constrained contexts. Existing detection approaches that rely on output confidence heuristics or single-layer internal representations frequently fail to capture deep, complex factual inconsistencies across diverse languages. To address this, we introduce MultiHaluDet, a novel three-stage stacking framework that detects multilingual hallucinations by probing the full hidden state trajectories of frozen LLMs without requiring language-specific fine-tuning. Our method extracts sequential features across multiple layers and processes them via a hybrid architecture using multi-scale attention and self-attention pooling. By generating out-of-fold embeddings that feed into a calibrated classical classifier ensemble, MultiHaluDet captures both fine-grained and coarse-grained patterns of factual inconsistency. Extensive experiments demonstrate that our framework achieves state-of-the-art detection performance, reaching up to 98.55% AUROC on the English HaluEval and TriviaQA benchmarks using Mistral-7B and LLaMA2-7B architectures. Crucially, we rigorously evaluate our framework's cross-lingual generalization across high (French), medium (Bangla), and low-resource (Amharic) languages. MultiHaluDet demonstrates exceptional representational robustness, consistently outperforming baselines and successfully transferring hallucination detection capabilities across typologically diverse linguistic tiers.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hallucination Detection | TriviaQA (test) | AUC-ROC98.3 | 243 | |
| Hallucination Detection | HaluEval (test) | AUC-ROC98.55 | 176 | |
| Hallucination Detection | HaluEval English - Base (test) | AUROC98.5 | 4 | |
| Hallucination Detection | HaluEval French - High (test) | AUROC (%)96.2 | 4 | |
| Hallucination Detection | TriviaQA French - High (test) | AUROC95.5 | 4 | |
| Hallucination Detection | HaluEval Bangla - Medium (test) | AUROC89.1 | 4 | |
| Hallucination Detection | TriviaQA Bangla Medium (test) | AUROC (%)87.6 | 4 | |
| Hallucination Detection | HaluEval Amharic - Low (test) | AUROC78.5 | 4 | |
| Hallucination Detection | TriviaQA Amharic - Low (test) | AUROC75.8 | 4 |