Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DarkMind: Latent Chain-of-Thought Backdoor in Customized LLMs

About

With the rapid rise of personalized AI, customized large language models (LLMs) equipped with Chain of Thought (COT) reasoning now power millions of AI agents. However, their complex reasoning processes introduce new and largely unexplored security vulnerabilities. We present DarkMind, a novel latent reasoning level backdoor attack that targets customized LLMs by manipulating internal COT steps without altering user queries. Unlike prior prompt based attacks, DarkMind activates covertly within the reasoning chain via latent triggers, enabling adversarial behaviors without modifying input prompts or requiring access to model parameters. To achieve stealth and reliability, we propose dual trigger types instant and retrospective and integrate them within a unified embedding template that governs trigger dependent activation, employ a stealth optimization algorithm to minimize semantic drift, and introduce an automated conversation starter for covert activation across domains. Comprehensive experiments on eight reasoning datasets spanning arithmetic, commonsense, and symbolic domains, using five LLMs, demonstrate that DarkMind consistently achieves high attack success rates. We further investigate defense strategies to mitigate these risks and reveal that reasoning level backdoors represent a significant yet underexplored threat, underscoring the need for robust, reasoning aware security mechanisms.

Zhen Guo, Shanghao Shi, Shamim Yazdani, Ning Zhang, Reza Tourani• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningStrategyQA
Accuracy88
208
Algebraic ReasoningAQUA
Accuracy87.4
65
Formal ReasoningProofNet
ASR84.1
4
Mathematical ReasoningGSM8K
ASR86.5
4
Mathematical ReasoningAQUA
Answer Selection Rate (ASR)89.2
4
Multi-hop ReasoningStrategyQA
ASR79.6
4
Algebraic ReasoningAQUA
PPL31.9
3
Backdoor DetectionReasoning-level Backdoor (test)
Scrutiny32.5
3
Commonsense ReasoningStrategyQA
Perplexity33.1
3
Mathematical ReasoningGSM8K
Perplexity (PPL)34.2
3
Showing 10 of 11 rows

Other info

Follow for update