Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature
About
Large language models (LLMs) have shown the ability to produce fluent and cogent content, presenting both productivity opportunities and societal risks. To build trustworthy AI systems, it is imperative to distinguish between machine-generated and human-authored content. The leading zero-shot detector, DetectGPT, showcases commendable performance but is marred by its intensive computational costs. In this paper, we introduce the concept of conditional probability curvature to elucidate discrepancies in word choices between LLMs and humans within a given context. Utilizing this curvature as a foundational metric, we present **Fast-DetectGPT**, an optimized zero-shot detector, which substitutes DetectGPT's perturbation step with a more efficient sampling step. Our evaluations on various datasets, source models, and test conditions indicate that Fast-DetectGPT not only surpasses DetectGPT by a relative around 75% in both the white-box and black-box settings but also accelerates the detection process by a factor of 340, as detailed in Table 1. See \url{https://github.com/baoguangsheng/fast-detect-gpt} for code, data, and results.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Machine-generated text detection | MGT benchmark Essay | AUROC99.6 | 129 | |
| LGT Detection | Fast-DetectGPT XSum (test) | AUROC99.7 | 96 | |
| LGT Detection | Fast-DetectGPT PubMed (test) | AUROC0.928 | 96 | |
| AI-generated text detection | XSum Generated by ChatGPT (test) | AUROC0.9907 | 60 | |
| AI-generated text detection | XSum Generated by Claude3 (test) | AUROC99.42 | 60 | |
| AI-generated text detection | XSum Generated by GPT-4 (test) | AUROC0.9137 | 60 | |
| LGT Detection | WritingPrompts-small Fast-DetectGPT benchmark | AUROC99.9 | 54 | |
| LGT Detection | WritingPrompts small Fast-DetectGPT benchmark (test) | AUROC99.9 | 54 | |
| LGT Detection | XSum Fast-DetectGPT benchmark | AUROC99.7 | 54 | |
| LGT Detection | PubMed Fast-DetectGPT benchmark | AUROC0.908 | 54 |