Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Human Knowledge Integrated Multi-modal Learning for Single Source Domain Generalization

About

Generalizing image classification across domains remains challenging in critical tasks such as fundus image-based diabetic retinopathy (DR) grading and resting-state fMRI seizure onset zone (SOZ) detection. When domains differ in unknown causal factors, achieving cross-domain generalization is difficult, and there is no established methodology to objectively assess such differences without direct metadata or protocol-level information from data collectors, which is typically inaccessible. We first introduce domain conformal bounds (DCB), a theoretical framework to evaluate whether domains diverge in unknown causal factors. Building on this, we propose GenEval, a multimodal Vision Language Models (VLM) approach that combines foundational models (e.g., MedGemma-4B) with human knowledge via Low-Rank Adaptation (LoRA) to bridge causal gaps and enhance single-source domain generalization (SDG). Across eight DR and two SOZ datasets, GenEval achieves superior SDG performance, with average accuracy of 69.2% (DR) and 81% (SOZ), outperforming the strongest baselines by 9.4% and 1.8%, respectively.

Ayan Banerjee, Kuntal Thakur, Sandeep Gupta• 2026

Related benchmarks

TaskDatasetResultRank
Diabetic Retinopathy (DR) gradingAPTOS
Accuracy73.2
25
Diabetic Retinopathy (DR) gradingFGADR
Accuracy56.9
20
Diabetic Retinopathy (DR) gradingIDRID
Accuracy70.6
20
Diabetic Retinopathy GradingAptos (held-out)
Accuracy73.46
11
Diabetic Retinopathy GradingMessidor 2 (held-out)
Accuracy79.64
11
Diabetic Retinopathy GradingMessidor (held-out)
Accuracy67.7
11
Diabetic Retinopathy ClassificationEyePACS
Accuracy80.04
6
Diabetic Retinopathy ClassificationMessidor
Accuracy69.48
6
Diabetic Retinopathy GradingEyePACS (held-out target)
Accuracy83.18
6
Diabetic Retinopathy ClassificationAPTOS
Accuracy73.16
6
Showing 10 of 17 rows

Other info

Follow for update