Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Token-Level Constraint Boundary Search for Jailbreaking Text-to-Image Models

About

Text-to-Image (T2I) generation has advanced rapidly in recent years, but they also raise safety concerns due to the potential production of harmful content. In the practical deployments, T2I services typically adopt full-chain defenses that combine a prompt checker, a securely trained generator, and a post-hoc image checker. Jailbreaking such full-chain systems is challenging in the black-box settings because prompt tokens form a discrete combinatorial space and the attack must satisfy multiple coupled constraints under sparse feedback and limited queries. To address these challenges, we propose Token-level Constraint Boundary Search (TCBS)-Attack, a novel query-based black-box jailbreak attack that searches for tokens located near the decision boundaries defined by text and image checkers. TCBS-Attack incorporates decision boundaries as constraint conditions to guide the evolutionary search of token populations, iteratively optimize tokens near these boundaries. Such evolutionary search process reduces the effective search space and improves query efficiency while preserving semantic coherence. Extensive experiments demonstrate that TCBS-Attack consistently outperforms state-of-the-art jailbreak attacks across various T2I models, including securely trained open-source models and commercial online services like DALL-E 3. TCBS-Attack achieves an ASR-4 of 52.5% and an ASR-1 of 22.0% on jailbreaking full-chain T2I models, significantly surpassing baseline methods.

Jiangtao Liu, Zhaoxin Wang, Handing Wang, Cong Tian, Yaochu Jin• 2025

Related benchmarks

TaskDatasetResultRank
JailbreakingQ16
ASR-452.5
44
JailbreakingMHSC
ASR-431
44
JailbreakingUnsafe Prompts
Bypass Success Rate (Text)92.5
22
Text-to-Image Adversarial AttackI2P matching categories subset
Bypass Rate93.3
11
Showing 4 of 4 rows

Other info

Follow for update