Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels

About

Medical Visual Question Answering (Med-VQA) systems benefit the interpretation of medical images containing critical clinical information. However, the challenge of noisy labels and limited high-quality datasets remains underexplored. To address this, we establish the first benchmark for noisy labels in Med-VQA by simulating human mislabeling with semantically designed noise types. More importantly, we introduce the DiN framework, which leverages a diffusion model to handle noisy labels in Med-VQA. Unlike the dominant classification-based VQA approaches that directly predict answers, our Answer Diffuser (AD) module employs a coarse-to-fine process, refining answer candidates with a diffusion model for improved accuracy. The Answer Condition Generator (ACG) further enhances this process by generating task-specific conditional information via integrating answer embeddings with fused image-question features. To address label noise, our Noisy Label Refinement(NLR) module introduces a robust loss function and dynamic answer adjustment to further boost the performance of the AD module.

Erjian Guo, Zhen Zhao, Zicheng Wang, Tong Chen, Yunyi Liu, Luping Zhou• 2025

Related benchmarks

TaskDatasetResultRank
Medical Visual Question AnsweringPathVQA
Overall Accuracy66.67
86
Visual Question AnsweringVQA-RAD
Closed Accuracy75.81
49
Showing 2 of 2 rows

Other info

Code

Follow for update