Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Foot-In-The-Door: A Multi-turn Jailbreak for LLMs

About

Ensuring AI safety is crucial as large language models become increasingly integrated into real-world applications. A key challenge is jailbreak, where adversarial prompts bypass built-in safeguards to elicit harmful disallowed outputs. Inspired by psychological foot-in-the-door principles, we introduce FITD,a novel multi-turn jailbreak method that leverages the phenomenon where minor initial commitments lower resistance to more significant or more unethical transgressions. Our approach progressively escalates the malicious intent of user queries through intermediate bridge prompts and aligns the model's response by itself to induce toxic responses. Extensive experimental results on two jailbreak benchmarks demonstrate that FITD achieves an average attack success rate of 94% across seven widely used models, outperforming existing state-of-the-art methods. Additionally, we provide an in-depth analysis of LLM self-corruption, highlighting vulnerabilities in current alignment strategies and emphasizing the risks inherent in multi-turn interactions. The code is available at https://github.com/Jinxiaolong1129/Foot-in-the-door-Jailbreak.

Zixuan Weng, Xiaolong Jin, Jinyuan Jia, Xiangyu Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Jailbreak AttackHarmBench--
376
Jailbreak AttackAdvBench
AASR73.33
247
Jailbreak AttackJailbreakBench
ASR66.66
54
JailbreakingAdvBench--
44
Transferable Adversarial AttackAdvBench LLM Classifier (test)
TASR@16.79e+3
39
Transferable Adversarial AttackHarmBench Classifier (test)
TASR@163.2
37
Jailbreak AttackRedTeam 2K
ASR63.33
16
Jailbreak AttackJailbreak Evaluation GPT-4o-mini
ASR60
13
JailbreakingAdvBench
ASR@1 (No Refusal)21.5
11
JailbreakingDeepSeek V3.2
Attack Success Rate86.5
9
Showing 10 of 17 rows

Other info

Follow for update