FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

About

Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as retrieval-augmented generation, autonomous agents, and AI assistants. However, security remains a major concern for their widespread deployment, with threats such as prompt injection and knowledge corruption. To quantify the security risks faced by LLMs under these threats, the research community has developed heuristic-based and optimization-based red-teaming methods. Optimization-based methods generally produce stronger attacks than heuristic attacks and thus provide a more rigorous assessment of LLM security risks. However, they are often resource-intensive, requiring significant computation and GPU memory, especially for long context scenarios. The resource-intensive nature poses a major obstacle for the community (especially academic researchers) to systematically evaluate the security risks of long-context LLMs and assess the effectiveness of defense strategies at scale. In this work, we propose FlashRT, the first framework to improve the efficiency (in terms of both computation and memory) for optimization-based prompt injection and knowledge corruption attacks under long-context LLMs. Through extensive evaluations, we find that FlashRT consistently delivers a 2x-7x speedup (e.g., reducing runtime from one hour to less than ten minutes) and a 2x-4x reduction in GPU memory consumption (e.g., reducing from 264.1 GB to 65.7 GB GPU memory for a 32K token context) compared to state-of-the-art baseline nanoGCG. FlashRT can be broadly applied to black-box optimization methods, such as TAP and AutoDAN. We hope FlashRT can serve as a red-teaming tool to enable systematic evaluation of long-context LLM security. The code is available at: https://github.com/Wang-Yanting/FlashRT

Yanting Wang, Chenlong Yin, Ying Chen, Jinyuan Jia• 2026

Related benchmarks

Task	Dataset	Result
Adversarial Attack	NQ	ASR100	24
Prompt Injection Attack	NarrativeQA	ASR86	11
Prompt Injection Attack	GovReport	Attack Success Rate (ASR)74	11
Prompt Injection Attack	MuSiQue	Attack Success Rate (ASR)98	9
Prompt Injection Attack	MuSiQue	ASR92	6
Knowledge corruption attack	HotpotQA	ASR100	5
Knowledge corruption attack	MS Marco	ASR100	5
Prompt Injection Attack	Long Code Arena (LCA) project-level code completion 16K token contexts first 50 repositories medium context set	Attack Success Rate (ASR)80	4
Prompt Injection Attack	EHRAgent	ASR100	4
Prompt Injection Attack	GovReport	ASR100	4

Showing 10 of 10 rows

Other info

GitHub

Follow for update

@wizwand_team Discord