Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling

About

The emergence of large language models (LLMs) enables the development of intelligent agents capable of engaging in complex and multi-turn dialogues. However, multi-agent collaboration faces critical safety challenges, such as hallucination amplification and error injection and propagation. This paper presents GUARDIAN, a unified method for detecting and mitigating multiple safety concerns in GUARDing Intelligent Agent collaboratioNs. By modeling the multi-agent collaboration process as a discrete-time temporal attributed graph, GUARDIAN explicitly captures the propagation dynamics of hallucinations and errors. The unsupervised encoder-decoder architecture incorporating an incremental training paradigm learns to reconstruct node attributes and graph structures from latent embeddings, enabling the identification of anomalous nodes and edges with unparalleled precision. Moreover, we introduce a graph abstraction mechanism based on the Information Bottleneck Theory, which compresses temporal interaction graphs while preserving essential patterns. Extensive experiments demonstrate GUARDIAN's effectiveness in safeguarding LLM multi-agent collaborations against diverse safety vulnerabilities, achieving state-of-the-art accuracy with efficient resource utilization. The code is available at https://github.com/JialongZhou666/GUARDIAN

Jialong Zhou, Lichao Wang, Xiao Yang• 2025

Related benchmarks

TaskDatasetResultRank
Prompt InjectionMMLU
ASR@320.7
91
Prompt InjectionMATH
Attack Success Rate (ASR)19.3
36
Malicious Advice DefensePoisonRAG
ASR13.3
36
Trojan AttackInjecAgent
ASR24.3
36
Prompt InjectionCSQA
ASR31.3
36
Cascade Attack DetectionAutoGen
TPR@5%63.2
12
Cascade Attack DetectionLLM Debate
TPR@5%59.8
12
Showing 7 of 7 rows

Other info

Follow for update