Share your thoughts, 1 month free Claude Pro on usSee more

MAGE

Benchmarks

Task Name	Dataset Name	SOTA Result
Machine-Generated Text Detection	MAGE	AUROC (Avg)99.1	24
Detection Evasion	MAGE	ASR99.9	18
AI-text detection	MAGE in-distribution (test)	AUROC81	16
Detection of Machine-Generated Text	MAGE Main Experimental Supplement (test)	TP@20%85.12	14
Detection of Machine-Generated Text	MAGE (test)	TP @ 20% Threshold85.12	14
Machine-Generated Text Detection	MAGE COLING2025 (val)	AUC79.55	13
Paraphrase Quality Assessment	MAGE shared subset (evaluation 300 AI-written samples)	PPL16.25	12
AI Detector Evasion	MAGE (evaluation set)	ASR (τ=0.5)91.3	12
Machine-generated text detection	MAGE Unseen Domains & Unseen Model (test)	AUROC0.98	11
AI-generated text detection	MAGE BigScience 1.0 (test)	Accuracy96.7	8
AI-generated text detection	MAGE GLM 1.0 (test)	Accuracy94.1	8
AI-generated text detection	MAGE OPT 1.0 (test)	Accuracy89.1	8
AI-generated text detection	MAGE (LLaMA) 1.0 (test)	Accuracy88	8
AI-generated text detection	MAGE GPT 1.0 (test)	Accuracy82.7	8
AI-generated text detection	MAGE FLAN-T5 1.0 (test)	Accuracy68.9	8
Detection of LLM-generated text	MAGE Topic-based 3.5-turbo	Detection Accuracy100	8
Detection of LLM-generated text	MAGE News Topic-based 3.5-turbo	Detection Performance99.95	8
LLM-generated text detection	MAGE QA short text (<= 30 words)	AUROC0.9747	8
LLM-generated text detection	MAGE News short text (<= 30 words)	AUROC93.48	8
Detection of LLM generated text	MAGE QA	ROC AUC (FPR=1%)65.33	8
Detection of LLM generated text	MAGE News	ROC AUC @ FPR=1%0.6577	8
LLM-generated text detection	MAGE DIPPER attack	Human Score77.44	8
Machine-generated text detection	MAGE Arbitrary-domains & Arbitrary-models (test)	Human Recall0.9572	5
Machine-generated text detection	MAGE Paraphrasing Attack (test)	Human Recall79.66	4
Machine-generated text detection	MAGE Mean across 10 domains	AUROC0.947	3

Showing 25 of 43 rows