OTora: A Unified Red Teaming Framework for Reasoning-Level Denial-of-Service in LLM Agents
About
Large Language Models (LLMs) are increasingly deployed as autonomous agents that execute tool-augmented, multi-step tasks, where latency is a critical factor for real-world applications. Yet an overlooked threat is Reasoning-Level Denial-of-Service (R-DoS), in which an attacker preserves task correctness but degrades availability by inflating an agent's reasoning depth or tool-use budget. We introduce OTora, the first unified, two-stage red-teaming framework for instantiating R-DoS attacks. Stage I optimizes an adversarial trigger that induces targeted tool invocations using insertion-aware scoring and dynamic target co-evolution, supporting both black-box and white-box settings. Stage II generates agent-aware reasoning payloads via an ICL-guided genetic search that amplifies overthinking while maintaining correct task outcomes. Across WebShop, Email, and OS agents built on multiple backbone models such as LLaMA-70B and GPT-OSS-120B, OTora achieves up to 10 times increases in reasoning tokens and order-of-magnitude latency slowdowns, all while preserving near-baseline task accuracy. Finally, we discuss mitigation strategies for detecting and constraining abnormal reasoning and latency spikes. The code is available at https://github.com/llm2409/OTora.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reasoning-Level Denial-of-Service | WebShop Agent | Delay (s)170 | 18 | |
| Reasoning-Level Denial-of-Service | Email Agent | Processing Delay (s)182 | 18 | |
| Reasoning-Level Denial-of-Service | OS Agent | Delay (s)200 | 18 | |
| Reasoning-Level Denial-of-Service Attack | Webshop | E2E Success Rate87 | 6 | |
| Reasoning-Level Denial-of-Service | WebShop Environment Injection (test) | E2E Success87 | 4 | |
| Reasoning-Level Denial-of-Service | Email Environment Injection (test) | E2E Performance86 | 4 | |
| Reasoning-Level Denial-of-Service | OS Environment Injection (test) | E2E Success80 | 4 |