Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model

About

Fine-grained multimodal capability in Multimodal Large Language Models (MLLMs) has emerged as a critical research direction, particularly for tackling the visual grounding (VG) problem. Despite the strong performance achieved by existing approaches, they often employ disparate design choices when fine-tuning MLLMs for VG, lacking systematic verification to support these designs. To bridge this gap, this paper presents a comprehensive study of various design choices that impact the VG performance of MLLMs. We conduct our analysis using LLaVA-1.5, which has been widely adopted in prior empirical studies of MLLMs. While more recent models exist, we follow this convention to ensure our findings remain broadly applicable and extendable to other architectures. We cover two key aspects: (1) exploring different visual grounding paradigms in MLLMs, identifying the most effective design, and providing our insights; and (2) conducting ablation studies on the design of grounding data to optimize MLLMs' fine-tuning for the VG task. Finally, our findings contribute to a stronger MLLM for VG, achieving improvements of +5.6% / +6.9% / +7.0% on RefCOCO/+/g over the LLaVA-1.5.

Weitai Kang, Weiming Zhuang, Zhizhong Li, Yan Yan, Lingjuan Lyu• 2025

Related benchmarks

TaskDatasetResultRank
Referring Expression ComprehensionRefCOCO+ (val)
Accuracy80.3
345
Referring Expression ComprehensionRefCOCO (val)
Accuracy87.4
335
Referring Expression ComprehensionRefCOCO (testA)
Accuracy0.917
333
Referring Expression ComprehensionRefCOCOg (test)
Accuracy81.4
291
Referring Expression ComprehensionRefCOCOg (val)
Accuracy81.3
291
Referring Expression ComprehensionRefCOCO+ (test-A)
Accuracy86.9
172
Referring Expression ComprehensionRefCOCO+ (test-B)
Accuracy71.1
167
Referring Expression ComprehensionRefCOCO (test-B)
Accuracy81.5
160
Showing 8 of 8 rows

Other info

Follow for update