RADAR: Robust AI-Text Detection via Adversarial Learning
About
Recent advances in large language models (LLMs) and the intensifying popularity of ChatGPT-like applications have blurred the boundary of high-quality text generation between humans and machines. However, in addition to the anticipated revolutionary changes to our technology and society, the difficulty of distinguishing LLM-generated texts (AI-text) from human-generated texts poses new challenges of misuse and fairness, such as fake content generation, plagiarism, and false accusations of innocent writers. While existing works show that current AI-text detectors are not robust to LLM-based paraphrasing, this paper aims to bridge this gap by proposing a new framework called RADAR, which jointly trains a robust AI-text detector via adversarial learning. RADAR is based on adversarial training of a paraphraser and a detector. The paraphraser's goal is to generate realistic content to evade AI-text detection. RADAR uses the feedback from the detector to update the paraphraser, and vice versa. Evaluated with 8 different LLMs (Pythia, Dolly 2.0, Palmyra, Camel, GPT-J, Dolly 1.0, LLaMA, and Vicuna) across 4 datasets, experimental results show that RADAR significantly outperforms existing AI-text detection methods, especially when paraphrasing is in place. We also identify the strong transferability of RADAR from instruction-tuned LLMs to other LLMs, and evaluate the improved capability of RADAR via GPT-3.5-Turbo.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| AI-generated text detection | XSum Generated by ChatGPT (test) | AUROC0.9972 | 60 | |
| AI-generated text detection | XSum Generated by GPT-4 (test) | AUROC0.9931 | 60 | |
| AI-generated text detection | XSum Generated by Claude3 (test) | AUROC99.52 | 60 | |
| AI-generated text detection | AcademicResearch | AUC78.7 | 36 | |
| Machine-generated text detection | SemEval (test) | ASR17.52 | 26 | |
| AI-generated text detection | NewsArticle | AUC92.6 | 24 | |
| AI-generated text detection | LegalDocument | AUC0.917 | 24 | |
| AI-generated text detection | TravelTourism | AUC81.6 | 24 | |
| AI-generated text detection | Entertainment | AUC0.911 | 24 | |
| AI-generated text detection | PersonalCommunication | AUC0.632 | 24 |