Detecting AI-Generated Images via Contextual Anomaly Estimation in Masked AutoEncoders

About

Context-based detection methods such as DetectGPT achieve strong generalization in identifying AI-generated text by evaluating content compatibility with a model's learned distribution. In contrast, existing image detectors rely on discriminative features from pretrained backbones such as CLIP, which implicitly capture generator-specific artifacts. However, as modern generative models rapidly advance in visual fidelity, the artifacts these detectors depend on are becoming increasingly subtle or absent, undermining their reliability. Masked AutoEncoders (MAE) are inherently trained to reconstruct masked patches from visible context, naturally modeling patch-level contextual plausibility akin to conditional probability estimation, while also serving as a powerful semantic feature extractor through its encoder. We propose CINEMAE, a novel architecture that exploits both capabilities of MAE for AI-generated image detection: we derive per-patch anomaly signals from the reconstruction mechanism and extract global semantic features from the encoder, fusing both context-based and feature-based cues for robust detection. CINEMAE achieves highly competitive mean accuracies of 96.63\% on GenImage and 93.96\% on AIGCDetectBenchmark, maintaining over 93\% accuracy even under JPEG compression at QF=50.

Minsuk Jang, Hyunseo Jeong, Minseok Son, Changick Kim• 2025

Related benchmarks

Task	Dataset	Result
AIGC Detection	AIGCDetectBenchmark	Accuracy93.96	50
AI-generated image detection	GenImage SD v1.4 March 2024	Detection Accuracy (MJ)95.9	14
AI-generated image detection	Chameleon SD v1.4	Overall Accuracy61.8	10
AI-generated image detection	Chameleon (Full)	Overall Accuracy62.83	10

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord