FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

About

The emergence of generative pre-trained models has facilitated the synthesis of high-quality text, but it has also posed challenges in identifying factual errors in the generated text. In particular: (1) A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models. (2) Generated texts tend to be lengthy and lack a clearly defined granularity for individual facts. (3) There is a scarcity of explicit evidence available during the process of fact checking. With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT). Experiments on four different tasks (knowledge-based QA, code generation, mathematical reasoning, and scientific literature review) show the efficacy of the proposed method. We release the code of FacTool associated with ChatGPT plugin interface at https://github.com/GAIR-NLP/factool .

I-Chun Chern, Steffi Chern, Shiqi Chen, Weizhe Yuan, Kehua Feng, Chunting Zhou, Junxian He, Graham Neubig, Pengfei Liu• 2023

Related benchmarks

Task	Dataset	Result
Hallucination Detection	HaluEval	--	131
Hallucination Detection	MMLU-Pro	Accuracy62.26	30
Hallucination Detection	XTRUST	Accuracy48.37	30
Veracity Assessment	FactCheck-Bench	Macro-F173	26
Fact Checking	FeLMWk	F1 (True)0.6	16
Veracity Assessment	FacTool-QA	True F184	12
Veracity Assessment	BingCheck	True F10.68	12
Fact Checking	HOVER	--	12

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord