ForensicFormer: Hierarchical Multi-Scale Reasoning for Cross-Domain Image Forgery Detection
About
The proliferation of AI-generated imagery and sophisticated editing tools has rendered traditional forensic methods ineffective for cross-domain forgery detection. We present ForensicFormer, a hierarchical multi-scale framework that unifies low-level artifact detection, mid-level boundary analysis, and high-level semantic reasoning via cross-attention transformers. Unlike prior single-paradigm approaches, which achieve <75% accuracy on out-of-distribution datasets, our method maintains 86.8% average accuracy across seven diverse test sets, spanning traditional manipulations, GAN-generated images, and diffusion model outputs - a significant improvement over state-of-the-art universal detectors. We demonstrate superior robustness to JPEG compression (83% accuracy at Q=70 vs. 66% for baselines) and provide pixel-level forgery localization with a 0.76 F1-score. Extensive ablation studies validate that each hierarchical component contributes 4-10% accuracy improvement, and qualitative analysis reveals interpretable forensic features aligned with human expert reasoning. Our work bridges classical image forensics and modern deep learning, offering a practical solution for real-world deployment where manipulation techniques are unknown a priori.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Forgery Classification | Average across 7 Sets (CASIA2, NIST16, DEFACTO, ForenSynths, DiffusionDB, Midjourney, RAISE) (test) | Accuracy86.8 | 25 | |
| Image Forgery Classification | CASIA 2 | Accuracy95.1 | 7 | |
| Image Forgery Classification | NIST 16 | Accuracy80.6 | 7 | |
| Image Forgery Classification | DEFACTO | Accuracy84.7 | 7 | |
| Image Forgery Classification | ForenSynths | Accuracy83.2 | 7 | |
| Image Forgery Classification | DiffusionDB | Accuracy83.8 | 7 | |
| Image Forgery Classification | Midjourney | Accuracy80.3 | 7 | |
| Image Forgery Classification | RAISE | Accuracy93.2 | 7 | |
| Forgery Localization | CASIA2 + NIST16 | Pixel F176 | 4 |