InFi-Check: Interpretable and Fine-Grained Fact-Checking of LLMs
About
Large language models (LLMs) often hallucinate, yet most existing fact-checking methods treat factuality evaluation as a binary classification problem, offering limited interpretability and failing to capture fine-grained error types. In this paper, we introduce InFi-Check, a framework for interpretable and fine-grained fact-checking of LLM outputs. Specifically, we first propose a controlled data synthesis pipeline that generates high-quality data featuring explicit evidence, fine-grained error type labels, justifications, and corrections. Based on this, we further construct large-scale training data and a manually verified benchmark InFi-Check-FG for fine-grained fact-checking of LLM outputs. Building on these high-quality training data, we further propose InFi-Checker, which can jointly provide supporting evidence, classify fine-grained error types, and produce justifications along with corrections. Experiments show that InFi-Checker achieves state-of-the-art performance on InFi-Check-FG and strong generalization across various downstream tasks, significantly improving the utility and trustworthiness of factuality evaluation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Fine-grained Hallucination Detection | InFi-Check-FG (test) | BAcc (Normalized)92.34 | 30 | |
| Veracity Assessment | FactCheck-Bench | Macro-F188 | 26 | |
| Hallucination Detection | FRANK | Balanced Acc77.2 | 18 | |
| Fact Checking | InFi-Check-FG 1.0 (test) | PredE93.51 | 18 | |
| Fact Checking | ExpertQA | -- | 15 | |
| Binary Fact-checking | Claim Verify | Macro F10.896 | 14 | |
| Binary Fact-checking | MediaSum | Macro-F180.4 | 14 | |
| Binary Fact-checking | MeetingBank | Macro-F178.5 | 14 | |
| Binary Fact-checking | Reveal | Macro-F190 | 14 |