Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents

About

Large language model-based multi-agent systems have shown great abilities across various tasks due to the collaboration of expert agents, each focusing on a specific domain. However, the impact of clumsy or even malicious agents--those who frequently make errors in their tasks--on the overall performance of the system remains underexplored. This paper investigates: (1) What is the resilience of various system structures (e.g., A$\rightarrow$B$\rightarrow$C, A$\leftrightarrow$B$\leftrightarrow$C) under faulty agents, on different downstream tasks? (2) How can we increase system resilience to defend against these agents? To simulate faulty agents, we propose two approaches--AutoTransform and AutoInject--which introduce mistakes into the agents' responses. Experiments on four downstream tasks using six systems show that the "hierarchical" structure, i.e., A$\rightarrow$(B$\leftrightarrow$C), exhibits superior resilience with the lowest performance drop of 5.5%, compared to 10.5% and 23.7% of other two structures. To further improve resilience, we introduce (1) Challenger, that introduces a mechanism for each agent to challenge others' outputs, and (2) Inspector, an additional agent to review and correct messages, recovering up to 96.4% errors made by faulty agents. Our code and data are available at https://github.com/CUHK-ARISE/MAS-Resilience.

Jen-tse Huang, Jiaxu Zhou, Tailin Jin, Xuhui Zhou, Zixi Chen, Wenxuan Wang, Youliang Yuan, Michael R. Lyu, Maarten Sap• 2024

Related benchmarks

TaskDatasetResultRank
Prompt InjectionMMLU
ASR@319.2
31
Malicious Advice DefensePoisonRAG
ASR@325.5
18
Prompt InjectionMMLU random topology
ASR (k=1)15.5
16
Prompt Injection DefensePI (CSQA) random topology
ASR @146.5
16
Prompt Injection DefenseCSQA
ASR@326.9
16
Tool Attack DefenseInjecAgent random topology (test)
ASR@10.15
16
Prompt Injection DefenseGSM8K PI (Prompt Injection) (test)
ASR@15.5
16
Showing 7 of 7 rows

Other info

Follow for update