Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Why Keep Your Doubts to Yourself? Trading Visual Uncertainties in Multi-Agent Bandit Systems

About

Vision-Language Models (VLMs) enable powerful multi-agent systems, but scaling them is economically unsustainable: coordinating heterogeneous agents under information asymmetry often spirals costs. Existing paradigms, such as Mixture-of-Agents and knowledge-based routers, rely on heuristic proxies that ignore costs and collapse uncertainty structure, leading to provably suboptimal coordination. We introduce Agora, a framework that reframes coordination as a decentralized market for uncertainty. Agora formalizes epistemic uncertainty into a structured, tradable asset (perceptual, semantic, inferential), and enforces profitability-driven trading among agents based on rational economic rules. A market-aware broker, extending Thompson Sampling, initiates collaboration and guides the system toward cost-efficient equilibria. Experiments on five multimodal benchmarks (MMMU, MMBench, MathVision, InfoVQA, CC-OCR) show that Agora outperforms strong VLMs and heuristic multi-agent strategies, e.g., achieving +8.5% accuracy over the best baseline on MMMU while reducing cost by over 3x. These results establish market-based coordination as a principled and scalable paradigm for building economically viable multi-agent visual intelligence systems.

Jusheng Zhang, Yijia Fan, Kaitong Cai, Jing Yang, Jiawei Yao, Jian Wang, Guanlong Qu, Ziliang Chen, Keze Wang• 2026

Related benchmarks

TaskDatasetResultRank
Visual Mathematical ReasoningMathVision
Accuracy44.3
254
Multi-discipline Multimodal UnderstandingMMMU (val)
Accuracy79.2
212
Multimodal UnderstandingMMBench v1.1 (test)--
19
Visual Question AnsweringInfoVQA (test)
Accuracy88.9
13
Optical Character RecognitionCC-OCR
Accuracy81.2
9
Showing 5 of 5 rows

Other info

Follow for update