Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning to Trim: End-to-End Causal Graph Pruning with Dynamic Anatomical Feature Banks for Medical VQA

About

Medical Visual Question Answering (MedVQA) models often exhibit limited generalization due to reliance on dataset-specific correlations, such as recurring anatomical patterns or question-type regularities, rather than genuine diagnostic evidence. Existing causal approaches are typically implemented as static adjustments or post-hoc corrections. To address this issue, we propose a Learnable Causal Trimming (LCT) framework that integrates causal pruning into end-to-end optimization. We introduce a Dynamic Anatomical Feature Bank (DAFB), updated via a momentum mechanism, to capture global prototypes of frequent anatomical and linguistic patterns, serving as an approximation of dataset-level regularities. We further design a differentiable trimming module that estimates the dependency between instance-level representations and the global feature bank. Features highly correlated with global prototypes are softly suppressed, while instance-specific evidence is emphasized. This learnable mechanism encourages the model to prioritize causal signals over spurious correlations adaptively. Experiments on VQA-RAD, SLAKE, SLAKE-CP and PathVQA demonstrate that LCT consistently improves robustness and generalization over existing debiasing strategies.

Zibo Xu, Qiang Li, Weizhi Nie, Yuting Su• 2026

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVQA-RAD (test)
Open-ended Accuracy70.9
46
Visual Question AnsweringSLAKE (test)
Accuracy89.9
20
Medical Visual Question AnsweringSLAKE-CP
Open Score28.9
9
Showing 3 of 3 rows

Other info

Follow for update