Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Dialectic-Med: Mitigating Diagnostic Hallucinations via Counterfactual Adversarial Multi-Agent Debate

About

Multimodal Large Language Models (MLLMs) in healthcare suffer from severe confirmation bias, often hallucinating visual details to support initial, potentially erroneous diagnostic hypotheses. Existing Chain-of-Thought (CoT) approaches lack intrinsic correction mechanisms, rendering them vulnerable to error propagation. To bridge this gap, we propose Dialectic-Med, a multi-agent framework that enforces diagnostic rigor through adversarial dialectics. Unlike static consensus models, Dialectic-Med orchestrates a dynamic interplay between three role-specialized agents: a proponent that formulates diagnostic hypotheses; an opponent equipped with a novel visual falsification module that actively retrieves contradictory visual evidence to challenge the Proponent; and a mediator that resolves conflicts via a weighted consensus graph. By explicitly modeling the cognitive process of falsification, our framework guarantees that diagnostic reasoning is tightly grounded in verified visual regions. Empirical evaluations on MIMIC-CXR-VQA, VQA-RAD, and PathVQA demonstrate that Dialectic-Med not only achieves state-of-the-art performance but also fundamentally enhances the trustworthiness of the reasoning process. Beyond accuracy, our approach significantly enhances explanation faithfulness and decisively mitigates hallucinations, establishing a new standard over single-agent baselines.

Zhixiang Lu, Jionglong Su• 2026

Related benchmarks

TaskDatasetResultRank
Vision-Language Medical ReasoningPathVQA
Token Cost (tokens/question)2.5
29
Multimodal Medical ReasoningVQA-RAD
Accuracy (%)80.45
18
Multimodal Medical ReasoningMIMIC-CXR VQA
Accuracy76.28
18
Medical Visual Question AnsweringMIMIC-CXR VQA
CHAIRS10.7
3
Showing 4 of 4 rows

Other info

Follow for update