Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MAGE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Machine-Generated Text DetectionMAGE
AUROC (Avg)99.1
24
Detection EvasionMAGE
ASR99.9
18
Detection of Machine-Generated TextMAGE Main Experimental Supplement (test)
TP@20%85.12
14
Detection of Machine-Generated TextMAGE (test)
TP @ 20% Threshold85.12
14
Machine-Generated Text DetectionMAGE COLING2025 (val)
AUC79.55
13
Paraphrase Quality AssessmentMAGE shared subset (evaluation 300 AI-written samples)
PPL16.25
12
AI Detector EvasionMAGE (evaluation set)
ASR (τ=0.5)91.3
12
Machine-generated text detectionMAGE Unseen Domains & Unseen Model (test)
Human Recall95.65
9
AI-generated text detectionMAGE BigScience 1.0 (test)
Accuracy96.7
8
AI-generated text detectionMAGE GLM 1.0 (test)
Accuracy94.1
8
AI-generated text detectionMAGE OPT 1.0 (test)
Accuracy89.1
8
AI-generated text detectionMAGE (LLaMA) 1.0 (test)
Accuracy88
8
AI-generated text detectionMAGE GPT 1.0 (test)
Accuracy82.7
8
AI-generated text detectionMAGE FLAN-T5 1.0 (test)
Accuracy68.9
8
Detection of LLM-generated textMAGE Topic-based 3.5-turbo
Detection Accuracy100
8
Detection of LLM-generated textMAGE News Topic-based 3.5-turbo
Detection Performance99.95
8
LLM-generated text detectionMAGE QA short text (<= 30 words)
AUROC0.9747
8
LLM-generated text detectionMAGE News short text (<= 30 words)
AUROC93.48
8
Detection of LLM generated textMAGE QA
ROC AUC (FPR=1%)65.33
8
Detection of LLM generated textMAGE News
ROC AUC @ FPR=1%0.6577
8
LLM-generated text detectionMAGE DIPPER attack
Human Score77.44
8
Machine-generated text detectionMAGE Arbitrary-domains & Arbitrary-models (test)
Human Recall0.9572
5
Machine-generated text detectionMAGE Paraphrasing Attack (test)
Human Recall79.66
4
AI-Generated Text DetectionMAGE DeepSeek-R1 OOD
Accuracy71
3
AI-Generated Text DetectionMAGE Claude-sonnet-4-5 OOD
Accuracy57.1
3
Showing 25 of 27 rows