Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Out of Sight, Not Out of Mind: Unveiling Latent Attack in Latent-based Multi-Agent Systems

About

Latent-based multi-agent systems replace parts of explicit inter-agent communication with hidden representations, offering a new direction for efficient and flexible agent collaboration. However, moving coordination into latent space may also move attacks beyond the reach of visible-text inspection. In this paper, we study whether latent states can carry attack-associated information that remains effective during clean executions. To examine this question, we introduce a latent attack framework that reactivates attack-induced effects through latent interventions without reusing adversarial text. Extensive experiments show that the resulting latent-only attacks can substantially degrade task performance in clean executions, especially when applied to inter-agent KV-cache handoffs rather than local hidden states. Further control analyses indicate that this degradation cannot be reduced to arbitrary perturbations or invalid generation. Overall, our findings suggest that latent-based collaboration does not remove attack risk. It shifts part of the risk into less observable execution states, calling for safeguards beyond visible-text inspection.

Chenxi Wang, Ruiyang Huang, Jiayan Sun, Lei Wei, Yifan Wu• 2026

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval+ (test)--
132
Multiple-choice Question AnsweringOpenBookQA (test)
Accuracy88.8
61
Mathematical ReasoningGSM8K (test)
Accuracy92
23
Direction-aware projection detection for edge-level KV interventionsGSM8K traces (held-out)
FPR0.00e+0
12
Direction-agnostic layer-profile detection for edge-level KV interventionsGSM8K (held-out traces)
FPR4.4
9
Direction-agnostic layer-profile detection for node-level hidden-state interventionsGSM8K (held-out)
FPR4
3
Showing 6 of 6 rows

Other info

Follow for update