TALL: Thumbnail Layout for Deepfake Video Detection

About

The growing threats of deepfakes to society and cybersecurity have raised enormous public concerns, and increasing efforts have been devoted to this critical topic of deepfake video detection. Existing video methods achieve good performance but are computationally intensive. This paper introduces a simple yet effective strategy named Thumbnail Layout (TALL), which transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies. Specifically, consecutive frames are masked in a fixed position in each frame to improve generalization, then resized to sub-images and rearranged into a pre-defined layout as the thumbnail. TALL is model-agnostic and extremely simple by only modifying a few lines of code. Inspired by the success of vision transformers, we incorporate TALL into Swin Transformer, forming an efficient and effective method TALL-Swin. Extensive experiments on intra-dataset and cross-dataset validate the validity and superiority of TALL and SOTA TALL-Swin. TALL-Swin achieves 90.79$\%$ AUC on the challenging cross-dataset task, FaceForensics++ $\to$ Celeb-DF. The code is available at https://github.com/rainy-xu/TALL4Deepfake.

Yuting Xu, Jian Liang, Gengyun Jia, Ziming Yang, Yanhao Zhang, Ran He• 2023

Related benchmarks

Task	Dataset	Result
Deepfake Detection	DFDC	AUC76.8	230
Deepfake Detection	DFD	AUC0.892	193
Deepfake Detection	CelebDF v2	AUC0.871	134
Deepfake Detection	DFDC (test)	AUC76.8	130
Deepfake Detection	CDF v2	AUC0.908	97
AI-generated Video Detection	EA-Video seen (evaluation)	Accuracy83.6	88
Deepfake Detection	CDFv1, CDFv2, DFD, DFDCP, DFDC (test)	Overall Average Score55.786	74
Deepfake Detection	Celeb-DF v2 (test)	Video-level AUC0.908	68
Deepfake Detection	FaceForensics++ (test)	AUC99.87	65
Image Deepfake Detection	DFo	AUC0.7423	62

Showing 10 of 121 rows

...

Other info

Follow for update

@wizwand_team Discord