IdentityGuard: Context-Aware Restriction and Provenance for Personalized Synthesis
About
The nature of personalized text-to-image models poses a unique safety challenge that generic context-blind methods are ill-equipped to handle. Such global filters create a dilemma: to prevent misuse, they are forced to damage the model's broader utility by erasing concepts entirely, causing unacceptable collateral damage.Our work presents a more precisely targeted approach, built on the principle that security should be as context-aware as the threat itself, intrinsically bound to the personalized concept. We present IDENTITYGUARD, which realizes this principle through a conditional restriction that blocks harmful content only when combined with the personalized identity, and a concept-specific watermark for precise traceability. Experiments show our approach prevents misuse while preserving the model's utility and enabling robust traceability. By moving beyond blunt, global filters, our work demonstrates a more effective and responsible path toward AI safety.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-Image Generation | Benign Prompts | FID54.72 | 7 | |
| Text-to-Image Generation | Malicious Prompts | FID-Censored393.1 | 6 | |
| Watermark Robustness | Benign and Malicious Prompts | Bit Accuracy97.1 | 4 | |
| Nudity Detection | 100 images generated from malicious 'naked' prompt (test) | Explicit Detections1 | 4 |