MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text
About
Large language models are now embedded in everyday writing workflows, making reliable AI-generated text detection important for academic integrity, content moderation, and provenance tracking. In practice, however, a detector must do more than achieve high aggregate AUROC on clean, in-distribution human and AI text: it should remain robust to attacks and adversarial rewrites, transfer to unseen generators and domains, and operate at low false-positive rates (FPR). Most existing detectors optimize a single AI/Human objective, giving the representation little incentive to learn generator, attack, or domain structure once the binary task saturates. We introduce MELD (Multi-Task Equilibrated Learning Detector), a deployable detector for AI-generated text that enriches binary detection with auxiliary supervision. MELD attaches generator-family, attack-type, and source-domain heads to a shared encoder, and balances the four losses with learned homoscedastic uncertainty weights. To improve robustness, an EMA teacher predicts on clean inputs while an attack-augmented student is distilled toward the teacher. MELD further uses a hard-negative pairwise ranking loss to enlarge the score margin between AI-generated texts and the most confusable human texts. At inference, all auxiliary heads are discarded, giving MELD the same interface and cost as a standard detector. On the public RAID leaderboard, MELD is the strongest open-source detector and is competitive with leading commercial models, especially under attack and at low FPR. Across standard held-out benchmarks, MELD matches or outperforms supervised baselines. We further introduce MELD-eval, a held-out evaluation pool built from recent chat models released by four major LLM providers. Without additional finetuning, MELD achieves 99.9% TPR at 1% FPR on MELD-eval, while many baselines degrade sharply.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Machine-generated text detection | MAGE | AUROC (Avg)99.1 | 24 | |
| LLM-generated text detection | DetectRL | -- | 12 | |
| AI Text Detection | M4GT | AUROC78 | 10 | |
| AI Text Detection | Ghostbuster | AUROC100 | 10 | |
| AI Text Detection | MELD (eval) | AUROC99.99 | 10 | |
| LLM-generated text detection | MELD GPT-5.4-Mini (eval) | TPR @ 1% FPR100 | 10 | |
| LLM-generated text detection | MELD-eval Gemini-3-Flash | TPR@1%FPR99.7 | 10 | |
| LLM-generated text detection | MELD-eval Claude-Haiku-4.5 | TPR @ 1% FPR100 | 10 | |
| LLM-generated text detection | MELD Qwen-3.6-Plus (eval) | TPR @ 1% FPR99.9 | 10 | |
| LLM-generated text detection | MELD Overall (eval) | TPR @ 1% FPR99.9 | 10 |