SafeSieve: From Heuristics to Experience in Progressive Pruning for LLM-based Multi-Agent Communication

About

LLM-based multi-agent systems exhibit strong collaborative capabilities but often suffer from redundant communication and excessive token overhead. Existing methods typically enhance efficiency through pretrained GNNs or greedy algorithms, but often isolate pre- and post-task optimization, lacking a unified strategy. To this end, we present SafeSieve, a progressive and adaptive multi-agent pruning algorithm that dynamically refines the inter-agent communication through a novel dual-mechanism. SafeSieve integrates initial LLM-based semantic evaluation with accumulated performance feedback, enabling a smooth transition from heuristic initialization to experience-driven refinement. Unlike existing greedy Top-k pruning methods, SafeSieve employs 0-extension clustering to preserve structurally coherent agent groups while eliminating ineffective links. Experiments across benchmarks (SVAMP, HumanEval, etc.) showcase that SafeSieve achieves 94.01% average accuracy while reducing token usage by 12.4%-27.8%. Results further demonstrate robustness under prompt injection attacks (1.23% average accuracy drop). In heterogeneous settings, SafeSieve reduces deployment costs by 13.3% while maintaining performance. These results establish SafeSieve as an efficient, GPU-free, and scalable framework for practical multi-agent systems. Our code can be found here: https://github.com/csgen/SafeSieve

Ruijia Zhang, Xinyan Zhao, Ruixiang Wang, Sigen Chen, Guibin Zhang, An Zhang, Kun Wang, Qingsong Wen• 2025

Related benchmarks

Task	Dataset	Result
Code Generation	HumanEval	Pass@190.15	1043
Arithmetic Reasoning	MultiArith	Accuracy97.8	293
Math Reasoning	GSM8K	Accuracy93.2	254
Math Reasoning	AQUA	Accuracy80.4	188
Knowledge Reasoning	MMLU	MMLU Knowledge Reasoning Accuracy84.65	73
Algebraic Reasoning	AQUA	Accuracy91.89	65
Coding	HumanEval	Accuracy90.15	60
Reasoning	MMLU	Accuracy84.65	54
Medical Question Answering	DDXPlus	Accuracy77.94	43
Multiple-choice Question Answering	AQUA	Accuracy80.4	43

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord