Ivy-Fake: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection

About

The rapid development of Artificial Intelligence Generated Content (AIGC) techniques has enabled the creation of high-quality synthetic content, but it also raises significant security concerns. Current detection methods face two major limitations: (1) the lack of multidimensional explainable datasets for generated images and videos. Existing open-source datasets (e.g., WildFake, GenVideo) rely on oversimplified binary annotations, which restrict the explainability and trustworthiness of trained detectors. (2) Prior MLLM-based forgery detectors (e.g., FakeVLM) exhibit insufficiently fine-grained interpretability in their step-by-step reasoning, which hinders reliable localization and explanation. To address these challenges, we introduce Ivy-Fake, the first large-scale multimodal benchmark for explainable AIGC detection. It consists of over 106K richly annotated training samples (images and videos) and 5,000 manually verified evaluation examples, sourced from multiple generative models and real world datasets through a carefully designed pipeline to ensure both diversity and quality. Furthermore, we propose Ivy-xDetector, a reinforcement learning model based on Group Relative Policy Optimization (GRPO), capable of producing explainable reasoning chains and achieving robust performance across multiple synthetic content detection benchmarks. Extensive experiments demonstrate the superiority of our dataset and confirm the effectiveness of our approach. Notably, our method improves performance on GenImage from 86.88% to 96.32%, surpassing prior state-of-the-art methods by a clear margin.

Changjiang Jiang, Wenhui Dong, Zhonghao Zhang, Fengchang Yu, Wei Peng, Xinbin Yuan, Yifei Bi, Ming Zhao, Zian Zhou, Chenyang Si, Caifeng Shan• 2025

Related benchmarks

Task	Dataset	Result
Generated Image Detection	GenImage (test)	Average Accuracy96.32	135
Deepfake Detection	CelebDF v2	AUC0.536	134
AI-generated image detection	AIGI-Now	FLUX-dev Pixel Score0.9559	49
AIGI Detection	BFree Online	B.Acc65.5	47
Synthetic Image Detection	Chameleon	Accuracy73.2	36
Image-level manipulation detection	DEFACTO 12k	AUC30.9	26
Deepfake Detection	FaceForensics++ c40 (test)	AUC83.1	24
Image-level Document Forgery Detection	DocTamper FCD	Accuracy33.9	24
Image Deepfake Detection	WDF	AUC0.821	23
Deepfake Detection	FaceShifter (FSh)	AUC67.3	23

Showing 10 of 47 rows

Other info

Follow for update

@wizwand_team Discord