Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Generating Attribution Reports for Manipulated Facial Images: A Dataset and Baseline

About

Existing facial forgery detection methods typically focus on binary classification or pixel-level localization, providing little semantic insight into the nature of the manipulation. To address this, we introduce Forgery Attribution Report Generation, a new multimodal task that jointly localizes forged regions ("Where") and generates natural language explanations grounded in the editing process ("Why"). This dual-focus approach goes beyond traditional forensics, providing a comprehensive understanding of the manipulation. To enable research in this domain, we present Multi-Modal Tamper Tracing (MMTT), a large-scale dataset of 152,217 samples, each with a process-derived ground-truth mask and a human-authored textual description, ensuring high annotation precision and linguistic richness. We further propose ForgeryTalker, a unified end-to-end framework that integrates vision and language via a shared encoder (image encoder + Q-former) and dual decoders for mask and text generation, enabling coherent cross-modal reasoning. Experiments show that ForgeryTalker achieves competitive performance on both report generation and forgery localization subtasks, i.e., 59.3 CIDEr and 73.67 IoU, respectively, establishing a baseline for explainable multimedia forensics. Dataset and code will be released to foster future research.

Jingchun Lian, Lingyu Liu, Yaxiong Wang, Yujiao Wu, Lianwei Wu, Li Zhu, Zhedong Zheng• 2024

Related benchmarks

TaskDatasetResultRank
Report GenerationMMTT
CIDEr59.3
11
Interpretation GenerationMMTT (test)
CIDEr59.3
10
Forgery LocalizationMMTT
IoU73.67
6
Report GenerationDQ_F++ zero-shot 2024b
BLEU-148.5
4
Report GenerationSynthScars face-modification
BLEU-110.8
3
Forgery LocalizationMMTT (test)
IoU73.67
3
Showing 6 of 6 rows

Other info

Follow for update