GraphCheck: Breaking Long-Term Text Barriers with Extracted Knowledge Graph-Powered Fact-Checking
About
Large language models (LLMs) are widely used, but they often generate subtle factual errors, especially in long-form text. These errors are fatal in some specialized domains such as medicine. Existing fact-checking with grounding documents methods face two main challenges: (1) they struggle to understand complex multihop relations in long documents, often overlooking subtle factual errors; (2) most specialized methods rely on pairwise comparisons, requiring multiple model calls, leading to high resource and computational costs. To address these challenges, we propose GraphCheck, a fact-checking framework that uses extracted knowledge graphs to enhance text representation. Graph Neural Networks further process these graphs as a soft prompt, enabling LLMs to incorporate structured knowledge more effectively. Enhanced with graph-based reasoning, GraphCheck captures multihop reasoning chains that are often overlooked by existing methods, enabling precise and efficient fact-checking in a single inference call. Experimental results on seven benchmarks spanning both general and medical domains demonstrate up to a 7.1% overall improvement over baseline models. Notably, GraphCheck outperforms existing specialized fact-checkers and achieves comparable performance with state-of-the-art LLMs, such as DeepSeek-V3 and OpenAI-o1, with significantly fewer parameters.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Fact Checking | PubHealth | Balanced Accuracy73.6 | 26 | |
| Fact Checking | COVID-Fact | Balanced Acc71.5 | 22 | |
| Scientific Fact Verification | SciFact | -- | 16 | |
| Fact Checking | ExpertQA | Balanced Accuracy60.3 | 15 | |
| Fact Checking | SciFact | Balanced Acc89.4 | 15 | |
| Fact Checking | AggreFact CNN | Balanced Acc66.5 | 15 | |
| Fact Checking | SummEval | Balanced Accuracy71 | 15 | |
| Fact Checking | Average across General and Medical Domains | Overall Average71.1 | 15 | |
| Fact Checking | AggreFact Xsum | Balanced Accuracy72.9 | 15 | |
| Fact Checking | Reveal | Balanced Accuracy89.7 | 7 |