AgentComm-Bench: Stress-Testing Cooperative Embodied AI Under Latency, Packet Loss, and Bandwidth Collapse

About

Cooperative multi-agent methods for embodied AI are almost universally evaluated under idealized communication: zero latency, no packet loss, and unlimited bandwidth. Real-world deployment on robots with wireless links, autonomous vehicles on congested networks, or drone swarms in contested spectrum offers no such guarantees. We introduce AgentComm-Bench, a benchmark suite and evaluation protocol that systematically stress-tests cooperative embodied AI under six communication impairment dimensions: latency, packet loss, bandwidth collapse, asynchronous updates, stale memory, and conflicting sensor evidence. AgentComm-Bench spans three task families: cooperative perception, multi-agent waypoint navigation, and cooperative zone search, and evaluates five communication strategies, including a lightweight method we propose based on redundant message coding with staleness-aware fusion. Our experiments reveal that communication-dependent tasks degrade catastrophically: stale memory and bandwidth collapse cause over 96% performance drops in navigation, while content corruption (stale or conflicting data) reduces perception F1 by over 85%. Vulnerability depends on the interaction between impairment type and task design; perception fusion is robust to packet loss but amplifies corrupted data. Redundant message coding more than doubles navigation performance under 80% packet loss. We release AgentComm-Bench as a practical evaluation protocol and recommend that cooperative embodied AI work report performance under multiple impairment conditions.

Aayam Bansal, Ishaan Gangwani• 2026

Related benchmarks

Task	Dataset	Result
Collaborative Perception (CP)	Multi-agent Communication Environment (test)	Mean Normalized Performance Drop28.5	5
Collaborative Perception (CP)	Multi-agent Simulation averaged across 6 impairment dimensions	AURC (% of max)73	5
Navigation (NAV)	Multi-agent Communication Environment (test)	Mean Normalized Performance Drop63.6	5
Navigation (NAV)	Multi-agent Simulation averaged across 6 impairment dimensions	AURC (% of max)69.8	5
Search	Multi-agent Simulation averaged across 6 impairment dimensions	AURC (% of max)84.9	5
Search	Multi-agent Communication Environment (test)	Mean Normalized Performance Drop31.4	5

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord