HydraPrompt: An Adaptive and Asymmetric Framework of Vision-Language Models for Synthetic Image Detection
About
The rapid evolution of generative models has precipitated a proliferation of fabricated content, posing significant challenges to existing Synthetic Image Detection (SID) methods. Capitalizing on advancements in vision-language models (e.g., CLIP), recent attempts have leveraged learnable textual prompts to identify synthetic images. However, they still leverage static prompt as a fixed boundary for real and fake images, failing to adapt to the varying types of forgery that emerge during inference. To overcome this issue, we propose **HydraPrompt**, an asymmetric prompting framework that dynamically adjusts the category centers by aligning with fine-grained image cues. Specifically, we propose an Asymmetric Prompt Adapter (**APA**): (1) for authentic category, we introduce a single set of prompts to capture the consistent representative patterns, which serves as a unified anchor for real content. While (2) for fake category, we construct sample-adaptive prompts that specialize in capturing diverse cues from different samples, enabling adaptive modeling of forgery image variations. To increase pronounced discriminability within different synthetic images, we further introduce a Conditional Supervised Contrastive (**CSC**) objective, which compacts the authentic representations while capturing fine-grained forgery clues. Extensive experiments on popular SID benchmarks demonstrate the state-of-the-art performance of our framework.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| AI-generated image detection | Chameleon (test) | Accuracy69.7 | 109 | |
| AI-generated image detection | WildRF Reddit (test) | Accuracy95.3 | 19 | |
| AI-generated image detection | WildRF (Facebook) (test) | Accuracy95.2 | 19 | |
| AI-generated image detection | WildRF Twitter (test) | Accuracy97.3 | 19 | |
| Synthetic Image Detection | UniversalFakeDetect Guided 49 (test) | Accuracy89.5 | 12 | |
| Synthetic Image Detection | UniversalFakeDetect LDM 200 steps 49 (test) | Accuracy99.5 | 12 | |
| Synthetic Image Detection | UniversalFakeDetect LDM 100 steps 49 (test) | Accuracy99.6 | 12 | |
| Synthetic Image Detection | UniversalFakeDetect Mean 49 (test) | Accuracy95.9 | 12 | |
| Synthetic Image Detection | UniversalFakeDetect DALL-E 49 (test) | Accuracy98.4 | 12 | |
| Synthetic Image Detection | UniversalFakeDetect LDM 200 w/cfg 49 (test) | Accuracy97.3 | 12 |