Enhancing General Face Forgery Detection via Vision Transformer with Low-Rank Adaptation
About
Nowadays, forgery faces pose pressing security concerns over fake news, fraud, impersonation, etc. Despite the demonstrated success in intra-domain face forgery detection, existing detection methods lack generalization capability and tend to suffer from dramatic performance drops when deployed to unforeseen domains. To mitigate this issue, this paper designs a more general fake face detection model based on the vision transformer(ViT) architecture. In the training phase, the pretrained ViT weights are freezed, and only the Low-Rank Adaptation(LoRA) modules are updated. Additionally, the Single Center Loss(SCL) is applied to supervise the training process, further improving the generalization capability of the model. The proposed method achieves state-of-the-arts detection performances in both cross-manipulation and cross-dataset evaluations.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Deepfake Detection | DFDC | AUC71.74 | 135 | |
| Deepfake Detection | Celeb-DF | ROC-AUC0.7967 | 30 | |
| Deepfake Detection | DFD | AUC83.42 | 9 | |
| Deepfake Detection | Celeb-DF, DFDC, and DFD cross-domain average FF++(HQ) trained (test) | Average AUC0.7828 | 6 |