Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models

About

Aligned Large Language Models (LLMs) have attracted significant attention for their safety, particularly in the context of jailbreak attacks that attempt to bypass guardrails via adversarial prompts. Among existing approaches, the Greedy Coordinate Gradient (GCG) attack pioneered automated jailbreaks through discrete token optimization; however, its low sample efficiency limits practical applicability. In particular, GCG requires approximately 256K evaluations per harmful behavior to achieve a satisfactory jailbreak success rate, due to the inherent difficulty of the underlying discrete optimization problem. In this work, we identify three key factors that limit the sample efficiency of GCG: inaccurate gradient-based estimation, inefficient uniform sampling, and repeated evaluation of previously explored suffixes. To address these issues, we propose Faster-GCG, a streamlined variant of GCG that incorporates distance-based regularization for improved estimation, temperature-controlled sampling for more effective exploration, and a visited-suffix marking mechanism to avoid redundant evaluations. Faster-GCG reduced the required evaluations to 32K, achieving up to an $8\times$ improvement in sampling efficiency and a $7\times$ reduction in wall-clock time compared to GCG. Under this reduced budget, Faster-GCG attained an average jailbreak success rate of 78.1\% across five aligned LLMs, and achieved 88.7\% against Qwen3.5-4B, outperforming state-of-the-art white-box jailbreak methods.

Xiao Li, Wei Zhang, Zhuhong Li, Qiongxiu Li, Shei PernChua, BingZe Lee, Jinghao Cui, Yifan Huang, Xiaolin Hu• 2024

Related benchmarks

TaskDatasetResultRank
Token-forcing loss optimizationRandom targets Held-out (val)
Qwen-2.5-7B Loss2.24
56
Jailbreak AttackLlama 7b 2
ASR34.2
17
Jailbreak AttackAdvBench
Loss0.16
16
Jailbreak AttackJBB Qwen3-4B
Loss0.149
13
Jailbreak AttackJBB
Llama2-7B ASR91.7
12
Jailbreak AttackJailbreak Evaluation Average across models
ASR78.1
10
Jailbreak AttackJBB Llama2-7B
Loss0.106
8
Jailbreak AttackJBB Gemma3-4B
Loss0.348
8
Jailbreak AttackJBB Llama3.1-8B
Loss0.466
7
Transfer Jailbreak AttackJBB Target: Gemini-3-flash
ASR5.6
2
Showing 10 of 12 rows

Other info

Follow for update