FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
About
The emergence of generative pre-trained models has facilitated the synthesis of high-quality text, but it has also posed challenges in identifying factual errors in the generated text. In particular: (1) A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models. (2) Generated texts tend to be lengthy and lack a clearly defined granularity for individual facts. (3) There is a scarcity of explicit evidence available during the process of fact checking. With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT). Experiments on four different tasks (knowledge-based QA, code generation, mathematical reasoning, and scientific literature review) show the efficacy of the proposed method. We release the code of FacTool associated with ChatGPT plugin interface at https://github.com/GAIR-NLP/factool .
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Veracity Assessment | FactCheck-Bench | Macro-F173 | 26 | |
| Fact Checking | FeLMWk | F1 (True)0.6 | 16 | |
| Veracity Assessment | FacTool-QA | True F184 | 12 | |
| Veracity Assessment | BingCheck | True F10.68 | 12 | |
| Fact Checking | HOVER | -- | 12 |