Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SPARK: Jailbreaking T2V Models by Synergistically Prompting Auditory and Recontextualized Knowledge

About

Jailbreak attacks can circumvent model safety guardrails and reveal critical blind spots. Prior attacks on text-to-video (T2V) models typically add adversarial perturbations to obviously unsafe prompts, which are often easy to detect and defend. In contrast, we show that benign-looking prompts containing rich, implicit cues can induce T2V models to generate semantically unsafe videos that both violate policy and preserve the original (blocked) intent. To realize this, we propose SPARK, a jailbreak framework that leverages T2V models cross-modal associative patterns via a modular prompt design. Specifically, our prompts combine three components: neutral scene anchors, which provide the surface-level scene description extracted from the blocked intent to maintain plausibility; latent auditory triggers, textual descriptions of innocuous-sounding audio events (e.g., creaking, muffled noises) that exploit learned audio-visual co-occurrence priors to bias the model toward particular unsafe visual concepts; and stylistic modulators, cinematic directives (e.g., camera framing, atmosphere) that amplify and stabilize the latent trigger's effect. We formalize attack generation as a constrained optimization over the above modular prompt space and solve it with a guided search procedure that balances stealth and effectiveness. Extensive experiments over 7 T2V models demonstrate the efficacy of our attack, achieving a +23% improvement in average attack success rate in commercial models.

Zonghao Ying, Moyang Chen, Nizhang Li, Zhiqiang Wang, Wenxin Zhang, Quanchen Zou, Zonglei Jing, Aishan Liu, Xianglong Liu• 2025

Related benchmarks

TaskDatasetResultRank
Safety Evaluation (Attack Success Rate)Pixverse
Pornography ASR80
5
Safety Evaluation (Attack Success Rate)Hailuo
Pornography ASR94
5
Safety Evaluation (Attack Success Rate)Kling
ASR (Pornography)88
5
Safety Evaluation (Attack Success Rate)Seedance
Pornography ASR88
5
JailbreakingPixverse
Pornography Rate82
4
JailbreakingSeedance
Pornography88
4
Visual Harmfulness EvaluationWan
Pornography100
4
Visual Harmfulness EvaluationCogVideoX
Pornography Rate100
4
Visual Harmfulness EvaluationHunyuanVideo
Pornography100
4
JailbreakingHailuo
Pornography Success Rate94
4
Showing 10 of 11 rows

Other info

Follow for update