AgenticRed: Evolving Agentic Systems for Red-Teaming

About

While recent automated red-teaming methods show promise for systematically exposing model vulnerabilities, most existing approaches rely on human-specified workflows. This dependence on manually designed workflows suffers from human biases and makes exploring the broader design space expensive. We introduce AgenticRed, an automated pipeline that leverages LLMs' in-context learning to iteratively design and refine red-teaming systems without human intervention. Rather than optimizing attacker policies within predefined structures, AgenticRed treats red-teaming as a system design problem, and it autonomously evolves automated red-teaming systems using evolutionary selection and generational knowledge. Red-teaming systems designed by AgenticRed consistently outperform state-of-the-art approaches, achieving 96% attack success rate (ASR) on Llama-2-7B, 98% on Llama-3-8B and 100% on Qwen3-8B on HarmBench. Our approach generates robust, query-agnostic red-teaming systems that transfer strongly to the latest proprietary models, achieving an impressive 100% ASR on GPT-5.1, DeepSeek-R1 and DeepSeek V3.2. This work highlights evolutionary algorithms as a powerful approach to AI safety that can keep pace with rapidly evolving models.

Jiayi Yuan, Jonathan N\"other, Natasha Jaques, Goran Radanovi\'c• 2026

Related benchmarks

Task	Dataset	Result
Red Teaming	HarmBench Llama-3-8B (test)	ASR0.98	5
Red Teaming	HarmBench Claude-Sonnet-3.5 (held-out test)	ASR60	5
Red Teaming	HarmBench Llama-2-7B (test)	ASR96	5
Red Teaming	HarmBench gpt-3.5-turbo-0125 (test)	ASR100	3
Red Teaming	HarmBench gpt-4o-2024-08-06 (test)	ASR100	3

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord