AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection
About
Detecting AI-generated text is becoming increasingly challenging as modern language models approach human-level fluency and can evade detectors that rely on surface statistics or likelihood-based signals. We propose \textsc{AEyeDE}, an attribution-driven approach to human-AI authorship detection that leverages model attention as a discriminative signal. Specifically, we extract attention-based attribution matrices for both human- and AI-generated text using a \emph{proxy} Transformer model with white-box access and train a lightweight Convolutional Neural Network to learn representations from these attribution maps. Across encoder-decoder translation settings, our method consistently outperforms a text-only baseline. In decoder-only settings, it performs strongly in generator-specific detection, remains competitive on standard benchmarks, and shows robustness under cross-dataset transfer and alternative-spelling perturbations. We further show that attention maps exhibit recurring local structures whose relative frequencies differ consistently between human- and AI-generated text across datasets and proxy models. These findings suggest that attention-based attribution maps provide a complementary and interpretable signal for AI-generated text detection. We will make the code publicly available to support future research.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| AI Text Detection | RAID Cohere 1.0 (test) | Accuracy97.23 | 7 | |
| AI Text Detection | RAID GPT-neo 1.0 (test) | Accuracy97.22 | 7 | |
| AI Text Detection | RAID Llama 1.0 (test) | Accuracy98.99 | 7 | |
| AI Text Detection | RAID Mistral 1.0 (test) | Accuracy (Acc)95.09 | 7 | |
| AI-generated text detection | RAID paraphrasing adversarial unified setting | Accuracy67.52 | 7 | |
| AI-generated text detection | RAID alternative-spelling adversarial (unified setting) | Accuracy70.34 | 7 | |
| AI-generated text detection | Beemo | Accuracy (Acc)66.36 | 7 | |
| AI-generated text detection | RAID alternative-spelling adversarial (individual setting) | Accuracy97.93 | 7 | |
| AI-generated text detection | RAID paraphrasing individual setting | Accuracy72.68 | 7 | |
| AI-generated translation detection | Marian-MT ar-en (test) | Accuracy71.9 | 3 |