Out of Sight, Not Out of Mind: Unveiling Latent Attack in Latent-based Multi-Agent Systems

About

Latent-based multi-agent systems replace parts of explicit inter-agent communication with hidden representations, offering a new direction for efficient and flexible agent collaboration. However, moving coordination into latent space may also move attacks beyond the reach of visible-text inspection. In this paper, we study whether latent states can carry attack-associated information that remains effective during clean executions. To examine this question, we introduce a latent attack framework that reactivates attack-induced effects through latent interventions without reusing adversarial text. Extensive experiments show that the resulting latent-only attacks can substantially degrade task performance in clean executions, especially when applied to inter-agent KV-cache handoffs rather than local hidden states. Further control analyses indicate that this degradation cannot be reduced to arbitrary perturbations or invalid generation. Overall, our findings suggest that latent-based collaboration does not remove attack risk. It shifts part of the risk into less observable execution states, calling for safeguards beyond visible-text inspection.

Chenxi Wang, Ruiyang Huang, Jiayan Sun, Lei Wei, Yifan Wu• 2026

Related benchmarks

Task	Dataset	Result
Code Generation	HumanEval+ (test)	--	132
Multiple-choice Question Answering	OpenBookQA (test)	Accuracy88.8	61
Mathematical Reasoning	GSM8K (test)	Accuracy92	43
Direction-aware projection detection for edge-level KV interventions	GSM8K traces (held-out)	FPR0.00e+0	12
Direction-agnostic layer-profile detection for edge-level KV interventions	GSM8K (held-out traces)	FPR4.4	9
Direction-agnostic layer-profile detection for node-level hidden-state interventions	GSM8K (held-out)	FPR4	3

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord