Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MAGE: Machine-generated Text Detection in the Wild

About

Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection to mitigate risks like the spread of fake news and plagiarism. Existing research has been constrained by evaluating detection methods on specific domains or particular language models. In practical scenarios, however, the detector faces texts from various domains or LLMs without knowing their sources. To this end, we build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs. Empirical results show challenges in distinguishing machine-generated texts from human-authored ones across various scenarios, especially out-of-distribution. These challenges are due to the decreasing linguistic distinctions between the two sources. Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios. We release our resources at https://github.com/yafuly/MAGE.

Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Zhilin Wang, Longyue Wang, Linyi Yang, Shuming Shi, Yue Zhang• 2023

Related benchmarks

TaskDatasetResultRank
LLM-generated text detectionRAID Abstract
ROC AUC100
8
LLM-generated text detectionRAID Reviews
ROC AUC1
8
Detection of LLM generated textMAGE Topic-based 3.5-turbo
Detection Accuracy99.98
8
Detection of LLM generated textMAGE News
ROC AUC @ FPR=1%0.00e+0
8
Detection of LLM generated textXsum
Paraphrase (4o-mini)0.0155
8
Detection of LLM generated textMAGE News Topic-based 3.5-turbo
Detection Performance99.57
8
LLM-generated text detectionMAGE News short text (<= 30 words)
AUROC89.44
8
LLM-generated text detectionMAGE QA short text (<= 30 words)
AUROC0.9345
8
LLM-generated text detectionRAID Poetry
ROC AUC95.88
8
LLM-generated text detectionRAID Wikipedia-related samples
GPT-4 Performance Score97.05
8
Showing 10 of 62 rows

Other info

Code

Follow for update