READER: Reasoning-Enhanced AI-Generated Text Detection
About
Recent advances in large language models (LLMs) have made it increasingly difficult to distinguish human-written text from AI-generated content. Many existing detectors train supervised neural classifiers that achieve strong in-distribution performance but are often opaque and can degrade substantially under distribution shift. We present READER, a reasoning-enhanced AI text detector that outputs both a human/AI label and a structured rationale describing the evidence for its decision. A key component of our approach is READ, a curated supervision set of rationales and verdicts. We fine-tune an LLM on READ to build READER, which reasons before detecting at inference time. Despite having only 1.5B parameters, READER consistently outperforms existing detectors as well as prompted, high-capacity LLM baselines (GPT-5.2, Gemini-3-Pro, and DeepSeek-V3.2), which are 100 to 1000 times larger in scale.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| AI-generated text detection | READ (test) | Accuracy95.3 | 55 | |
| Out-of-Distribution Detection | OOD Detection Source LLM: Gemini-2.5-Flash (test) | XSUM Score96 | 19 | |
| Out-of-Distribution Detection | OOD Detection Source LLM: Claude-3.5-Haiku (test) | XSUM0.977 | 19 | |
| Out-of-Distribution Detection | OOD Detection Source LLM GPT-4o (test) | XSUM Score98.7 | 19 | |
| Out-of-distribution AI-generated text detection | Grok Out-of-distribution (OOD) unseen domains (Legal, Email, Complaints) 4.1 (test) | Legal Accuracy99 | 16 | |
| Out-of-distribution AI-generated text detection | Kimi Out-of-distribution (OOD) K2.5 (unseen domains test) | Legal Accuracy98.7 | 16 | |
| Out-of-distribution AI-generated text detection | Mercury Out-of-distribution (OOD) 2 (test unseen domains) | Legal Accuracy95.7 | 16 | |
| Out-of-distribution AI-generated text detection | Mistral-Medium Out-of-distribution (OOD) unseen domains 3 (test) | Accuracy (Legal)99 | 16 | |
| AI-generated text detection | Cross-lingual and adversarial robustness benchmark Normal | Accuracy93.4 | 1 | |
| AI-generated text detection | Cross-lingual and adversarial robustness benchmark Mixed attack | Accuracy92.1 | 1 |