BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
About
Generative large language models (LLMs) have achieved state-of-the-art results on a wide range of tasks, yet they remain susceptible to backdoor attacks: carefully crafted triggers in the input can manipulate the model to produce adversary-specified outputs. While prior research has predominantly focused on backdoor risks in vision and classification settings, the vulnerability of LLMs in open-ended text generation remains underexplored. To fill this gap, we introduce BackdoorLLM (Our BackdoorLLM benchmark was awarded First Prize in the SafetyBench competition, https://www.mlsafety.org/safebench/winners, organized by the Center for AI Safety, https://safe.ai/.), the first comprehensive benchmark for systematically evaluating backdoor threats in text-generation LLMs. BackdoorLLM provides: (i) a unified repository of benchmarks with a standardized training and evaluation pipeline; (ii) a diverse suite of attack modalities, including data poisoning, weight poisoning, hidden-state manipulation, and chain-of-thought hijacking; (iii) over 200 experiments spanning 8 distinct attack strategies, 7 real-world scenarios, and 6 model architectures; (iv) key insights into the factors that govern backdoor effectiveness and failure modes in LLMs; and (v) a defense toolkit encompassing 7 representative mitigation techniques. Our code and datasets are available at https://github.com/bboylyg/BackdoorLLM. We will continuously incorporate emerging attack and defense methodologies to support the research in advancing the safety and reliability of LLMs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Backdoor Attack Defense | Backdoor Attacks (test) | ASR53.3 | 45 | |
| Mathematical Reasoning | AMC 23 | P@158.5 | 20 | |
| Mathematical Reasoning | MATH500 | P@176 | 20 | |
| Mathematical Reasoning | Olympiad | P@135.17 | 20 | |
| Scientific Question Answering | GPQA main (test) | P@117.76 | 20 | |
| Mathematical Reasoning | GSM8K | Pass@185.83 | 20 | |
| Mathematical Reasoning | Minerva | P@126.54 | 20 | |
| Mathematical Reasoning | AIME 24 | P@1 Accuracy23.33 | 18 | |
| Mathematical Reasoning | GSM8K | LLM Trust Score99.4 | 16 | |
| Mathematical Reasoning | MATH 500 | LLM Trust Score85 | 16 |