Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer

About

Non-semantic features or semantic-agnostic features, which are irrelevant to image context but sensitive to image manipulations, are recognized as evidential to Image Manipulation Localization (IML). Since manual labels are impossible, existing works rely on handcrafted methods to extract non-semantic features. Handcrafted non-semantic features jeopardize IML model's generalization ability in unseen or complex scenarios. Therefore, for IML, the elephant in the room is: How to adaptively extract non-semantic features? Non-semantic features are context-irrelevant and manipulation-sensitive. That is, within an image, they are consistent across patches unless manipulation occurs. Then, spare and discrete interactions among image patches are sufficient for extracting non-semantic features. However, image semantics vary drastically on different patches, requiring dense and continuous interactions among image patches for learning semantic representations. Hence, in this paper, we propose a Sparse Vision Transformer (SparseViT), which reformulates the dense, global self-attention in ViT into a sparse, discrete manner. Such sparse self-attention breaks image semantics and forces SparseViT to adaptively extract non-semantic features for images. Besides, compared with existing IML models, the sparse self-attention mechanism largely reduced the model size (max 80% in FLOPs), achieving stunning parameter efficiency and computation reduction. Extensive experiments demonstrate that, without any handcrafted feature extractors, SparseViT is superior in both generalization and efficiency across benchmark datasets.

Lei Su, Xiaochen Ma, Xuekang Zhu, Chaoqun Niu, Zeyu Lei, Ji-Zhe Zhou• 2024

Related benchmarks

TaskDatasetResultRank
Image Manipulation LocalizationNIST16
F1 Score75.99
75
Image Manipulation LocalizationCoverage
F1 Score69.74
49
Image Manipulation LocalizationColumbia
F1 Score96.11
42
Image Manipulation LocalizationCASIA v1
F1 Score54.36
36
Image Manipulation LocalizationIMD20
F1 Score51.84
24
Pixel-level Forgery LocalizationColumbia
F195.68
20
Image-level detectionOpenSDI
SD1.5 F1 Score95.97
15
Image Manipulation LocalizationOpenSDI SDXL
F1 Score33.96
9
Image Manipulation LocalizationOpenSDI SD1.5
F1 Score75.39
9
Image Manipulation LocalizationOpenSDI SD3
F1 Score40.2
9
Showing 10 of 20 rows

Other info

Follow for update