Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection

About

Detecting AI-generated text is becoming increasingly challenging as modern language models approach human-level fluency and can evade detectors that rely on surface statistics or likelihood-based signals. We propose \textsc{AEyeDE}, an attribution-driven approach to human-AI authorship detection that leverages model attention as a discriminative signal. Specifically, we extract attention-based attribution matrices for both human- and AI-generated text using a \emph{proxy} Transformer model with white-box access and train a lightweight Convolutional Neural Network to learn representations from these attribution maps. Across encoder-decoder translation settings, our method consistently outperforms a text-only baseline. In decoder-only settings, it performs strongly in generator-specific detection, remains competitive on standard benchmarks, and shows robustness under cross-dataset transfer and alternative-spelling perturbations. We further show that attention maps exhibit recurring local structures whose relative frequencies differ consistently between human- and AI-generated text across datasets and proxy models. These findings suggest that attention-based attribution maps provide a complementary and interpretable signal for AI-generated text detection. We will make the code publicly available to support future research.

Aria Nourbakhsh, Adelaide Danilov, Christoph Schommer, Salima Lamsiyah• 2026

Related benchmarks

TaskDatasetResultRank
AI Text DetectionRAID Cohere 1.0 (test)
Accuracy97.23
7
AI Text DetectionRAID GPT-neo 1.0 (test)
Accuracy97.22
7
AI Text DetectionRAID Llama 1.0 (test)
Accuracy98.99
7
AI Text DetectionRAID Mistral 1.0 (test)
Accuracy (Acc)95.09
7
AI-generated text detectionRAID paraphrasing adversarial unified setting
Accuracy67.52
7
AI-generated text detectionRAID alternative-spelling adversarial (unified setting)
Accuracy70.34
7
AI-generated text detectionBeemo
Accuracy (Acc)66.36
7
AI-generated text detectionRAID alternative-spelling adversarial (individual setting)
Accuracy97.93
7
AI-generated text detectionRAID paraphrasing individual setting
Accuracy72.68
7
AI-generated translation detectionMarian-MT ar-en (test)
Accuracy71.9
3
Showing 10 of 12 rows

Other info

Follow for update