SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations

About

Large Language Models have emerged as transformative tools for Security Operations Centers, enabling automated log analysis, phishing triage, and malware explanation; however, deployment in adversarial cybersecurity environments exposes critical vulnerabilities to prompt injection attacks where malicious instructions embedded in security artifacts manipulate model behavior. This paper introduces SecureCAI, a novel defense framework extending Constitutional AI principles with security-aware guardrails, adaptive constitution evolution, and Direct Preference Optimization for unlearning unsafe response patterns, addressing the unique challenges of high-stakes security contexts where traditional safety mechanisms prove insufficient against sophisticated adversarial manipulation. Experimental evaluation demonstrates that SecureCAI reduces attack success rates by 94.7% compared to baseline models while maintaining 95.1% accuracy on benign security analysis tasks, with the framework incorporating continuous red-teaming feedback loops enabling dynamic adaptation to emerging attack strategies and achieving constitution adherence scores exceeding 0.92 under sustained adversarial pressure, thereby establishing a foundation for trustworthy integration of language model capabilities into operational cybersecurity workflows and addressing a critical gap in current approaches to AI safety within adversarial domains.

Mohammed Himayath Ali, Mohammed Aqib Abdullah, Mohammed Mudassir Uddin, Shahnawaz Alam• 2026

Related benchmarks

Task	Dataset	Result
Security Analysis	Security Tasks 15,000 benign samples (test)	F1 (Log)96.8	5
Attack Resilience Evaluation	51,750 Adversarial Samples	Resilience Score (Log)4.2	5
Adversarial Attack Defense	Held-out attacks (test)	ASR (Multi-turn Manip.)7.8	2

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord