DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs

About

While Large Language Model (LLM) capabilities have scaled, safety guardrails remain largely stateless, treating multi-turn dialogues as a series of disconnected events. This lack of temporal awareness facilitates a "Safety Gap" where adversarial tactics, like Crescendo and ActorAttack, slowly bleed malicious intent across turn boundaries to bypass stateless filters. We introduce DeepContext, a stateful monitoring framework designed to map the temporal trajectory of user intent. DeepContext discards the isolated evaluation model in favor of a Recurrent Neural Network (RNN) architecture that ingests a sequence of fine-tuned turn-level embeddings. By propagating a hidden state across the conversation, DeepContext captures the incremental accumulation of risk that stateless models overlook. Our evaluation demonstrates that DeepContext significantly outperforms existing baselines in multi-turn jailbreak detection, achieving a state-of-the-art F1 score of 0.84, which represents a substantial improvement over both hyperscaler cloud-provider guardrails and leading open-weight models such as Llama-Prompt-Guard-2 (0.67) and Granite-Guardian (0.67). Furthermore, DeepContext maintains a sub-20ms inference overhead on a T4 GPU, ensuring viability for real-time applications. These results suggest that modeling the sequential evolution of intent is a more effective and computationally efficient alternative to deploying massive, stateless models.

Justin Albrethsen, Yash Datta, Kunal Kumar, Sharath Rajasekar• 2026

Related benchmarks

Task	Dataset	Result
Jailbreak Detection	JailBreakBench Single Turn 35	F1 Score98	10
Multi-turn Jailbreak Detection	HarmBench and DEFCON Multi-turn Jailbreak N=1,010 (test)	F1 Score84	10
Inference Latency	Multi-turn Adversarial Defense Latency Benchmark (inference)	Latency (ms)19	10

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord