Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FlowGuard: Towards Lightweight In-Generation Safety Detection for Diffusion Models via Linear Latent Decoding

About

Diffusion-based image generation models have advanced rapidly but pose a safety risk due to their potential to generate Not-Safe-For-Work (NSFW) content. Existing NSFW detection methods mainly operate either before or after image generation. Pre-generation methods rely on text prompts and struggle with the gap between prompt safety and image safety. Post-generation methods apply classifiers to final outputs, but they are poorly suited to intermediate noisy images. To address this, we introduce FlowGuard, a cross-model in-generation detection framework that inspects intermediate denoising steps. This is particularly challenging in latent diffusion, where early-stage noise obscures visual signals. FlowGuard employs a novel linear approximation for latent decoding and leverages a curriculum learning approach to stabilize training. By detecting unsafe content early, FlowGuard reduces unnecessary diffusion steps to cut computational costs. Our cross-model benchmark spanning nine diffusion-based backbones shows the effectiveness of FlowGuard for in-generation NSFW detection in both in-distribution and out-of-distribution settings, outperforming existing methods by over 30% in F1 score while delivering transformative efficiency gains, including slashing peak GPU memory demand by over 97% and projection time from 8.1 seconds to 0.2 seconds compared to standard VAE decoding.

Jinghan Yang, Yihe Fan, Xudong Pan, Min Yang• 2026

Related benchmarks

TaskDatasetResultRank
NSFW image classificationPixArt ID
Accuracy87.22
4
NSFW image classificationFlux1 T2I ID
Accuracy90.73
4
NSFW image classificationT2I Flux2 ID
Accuracy86.8
4
NSFW image classificationSD T2I ID v1.5
Accuracy86.05
4
NSFW image classificationT2I SD3 ID
Accuracy90.23
4
NSFW image classificationSDXL T2I OOD
Accuracy74.48
4
NSFW image classificationT2I Qwen-Image OOD
Accuracy82.88
4
NSFW image classificationT2I Zimage OOD
Accuracy89.08
4
NSFW image classificationT2I SD OOD 3.5
Accuracy80.49
4
Showing 9 of 9 rows

Other info

Follow for update