Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

About

Deepfake detection remains a formidable challenge due to the complex and evolving nature of fake content in real-world scenarios. However, existing academic benchmarks suffer from severe discrepancies from industrial practice, typically featuring homogeneous training sources and low-quality testing images, which hinder the practical deployments of current detectors. To mitigate this gap, we introduce HydraFake, a dataset that simulates real-world challenges with hierarchical generalization testing. Specifically, HydraFake involves diversified deepfake techniques and in-the-wild forgeries, along with rigorous training and evaluation protocol, covering unseen model architectures, emerging forgery techniques and novel data domains. Building on this resource, we propose Veritas, a multi-modal large language model (MLLM) based deepfake detector. Different from vanilla chain-of-thought (CoT), we introduce pattern-aware reasoning that involves critical reasoning patterns such as "planning" and "self-reflection" to emulate human forensic process. We further propose a two-stage training pipeline to seamlessly internalize such deepfake reasoning capacities into current MLLMs. Experiments on HydraFake dataset reveal that although previous detectors show great generalization on cross-model scenarios, they fall short on unseen forgeries and data domains. Our Veritas achieves significant gains across different OOD scenarios, and is capable of delivering transparent and faithful detection outputs.

Hao Tan, Jun Lan, Zichang Tan, Ajian Liu, Chuanbiao Song, Senyuan Shi, Huijia Zhu, Weiqiang Wang, Jun Wan, Zhen Lei• 2025

Related benchmarks

Task	Dataset	Result
Deepfake Detection	CelebDF v2	AUC0.079	134
AI-generated image detection	AIGI-Now	FLUX-dev Pixel Score0.847	49
AIGI Detection	BFree Online	B.Acc55.2	47
Synthetic Image Detection	Chameleon	Accuracy59.6	36
Image-level manipulation detection	DEFACTO 12k	AUC0.9	26
Image-level Document Forgery Detection	DocTamper FCD	Accuracy39.4	24
Deepfake Detection	FaceForensics++ c40 (test)	AUC4.4	24
Deepfake Detection	FaceShifter (FSh)	AUC2.3	23
Image Deepfake Detection	WDF	AUC0.063	23
Image Forgery Classification	ForenSynths	Accuracy69.5	19

Showing 10 of 53 rows

Other info

Follow for update

@wizwand_team Discord