Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs

About

While Large Language Model (LLM) capabilities have scaled, safety guardrails remain largely stateless, treating multi-turn dialogues as a series of disconnected events. This lack of temporal awareness facilitates a "Safety Gap" where adversarial tactics, like Crescendo and ActorAttack, slowly bleed malicious intent across turn boundaries to bypass stateless filters. We introduce DeepContext, a stateful monitoring framework designed to map the temporal trajectory of user intent. DeepContext discards the isolated evaluation model in favor of a Recurrent Neural Network (RNN) architecture that ingests a sequence of fine-tuned turn-level embeddings. By propagating a hidden state across the conversation, DeepContext captures the incremental accumulation of risk that stateless models overlook. Our evaluation demonstrates that DeepContext significantly outperforms existing baselines in multi-turn jailbreak detection, achieving a state-of-the-art F1 score of 0.84, which represents a substantial improvement over both hyperscaler cloud-provider guardrails and leading open-weight models such as Llama-Prompt-Guard-2 (0.67) and Granite-Guardian (0.67). Furthermore, DeepContext maintains a sub-20ms inference overhead on a T4 GPU, ensuring viability for real-time applications. These results suggest that modeling the sequential evolution of intent is a more effective and computationally efficient alternative to deploying massive, stateless models.

Justin Albrethsen, Yash Datta, Kunal Kumar, Sharath Rajasekar• 2026

Related benchmarks

TaskDatasetResultRank
Jailbreak DetectionJailBreakBench Single Turn 35
F1 Score98
10
Multi-turn Jailbreak DetectionHarmBench and DEFCON Multi-turn Jailbreak N=1,010 (test)
F1 Score84
10
Inference LatencyMulti-turn Adversarial Defense Latency Benchmark (inference)
Latency (ms)19
10
Showing 3 of 3 rows

Other info

Follow for update