PolarMem: A Training-Free Polarized Latent Graph Memory for Verifiable Multimodal Agents
About
As multimodal agents evolve from passive observers to long-horizon decision-makers, they require memory systems that provide not just information availability but logical verifiability. A fundamental limitation of current architectures is the epistemic asymmetry inherent in probabilistic vision-language models and dense associative memories: they conflate semantic affinity with factual existence and structurally fail to encode negative constraints. To this end, we introduce PolarMem, a training-free Polarized Latent Graph Memory designed to ground agent reasoning in verifiable evidence. PolarMem transforms fuzzy perceptual likelihoods into discrete logical constraints through non-parametric distributional partitioning. Furthermore, it employs a polarized graph topology with orthogonal inhibitory connections to explicitly store verified negation as a primary cognitive state. At inference time, we enforce a logic-dominant retrieval paradigm, suppressing hallucinatory patterns that violate negative constraints. Extensive evaluation across eight frozen Vision--Language Models and six benchmarks demonstrates that PolarMem functions as a robust cognitive system, establishing a foundation for verifiable multimodal agents. Our code is available at https://github.com/czs-ict/PolarMem.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hallucination Robustness | HallusionBench | Score57.8 | 32 | |
| Multimodal Retrieval-Augmented Generation | MRAMG | Score35 | 32 | |
| Multimodal Retrieval-Augmented Generation | MRAG | Score70.8 | 32 | |
| Visual Retrieval-Augmented Generation | Visual-RAG | Score53.4 | 32 | |
| General Reasoning | MMMU | Overall Score74.1 | 32 | |
| General Reasoning | MMStar | Score68.4 | 32 |