Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Are Large Vision-Language Models Ready to Guide Blind and Low-Vision Individuals?

About

Large Vision-Language Models (LVLMs) demonstrate a promising direction for assisting individuals with blindness or low-vision (BLV). Yet, measuring their true utility in real-world scenarios is challenging because evaluating whether their descriptions are BLV-informative requires a fundamentally different approach from assessing standard scene descriptions. While the "VLM-as-a-metric" or "LVLM-as-a-judge" paradigm has emerged, existing evaluators still fall short of capturing the unique requirements of BLV-centric evaluation, lacking at least one of the following key properties: (1) High correlation with human judgments, (2) Long instruction understanding, (3) Score generation efficiency, and (4) Multi-dimensional assessment. To this end, we propose a unified framework to bridge the gap between automated evaluation and actual BLV needs. First, we conduct an in-depth user study with BLV participants to understand and quantify their navigational preferences, curating VL-GUIDEDATA, a large-scale BLV user-simulated preference dataset containing image-request-response-score pairs. We then leverage the dataset to develop an accessibility-aware evaluator, VL-GUIDE-S, which outperforms existing (L)VLM judges in both human alignment and inference efficiency. Notably, its effectiveness extends beyond a single domain, demonstrating strong performance across multiple fine-grained, BLV-critical dimensions. We hope our work lays as a foundation for automatic AI judges that advance safe, barrier-free navigation for BLV users.

Eunki Kim, Na Min An, Wan Ju Kang, Sangryul Kim, James Thorne, Hyunjung Shim• 2025

Related benchmarks

TaskDatasetResultRank
Multimodal Preference EvaluationFOILR1
Preference Accuracy95
10
Multimodal Preference EvaluationFOIL R4
P-Acc95
10
Multimodal Preference EvaluationPolaris
tau_c53.9
10
Multimodal Preference EvaluationPascal
P-Acc82.3
10
Multimodal Preference EvaluationFlickrExp
tau_c51.7
10
Multimodal Preference EvaluationFlickrCF
Tau-b Score35.8
10
Multimodal Preference EvaluationVL-GUIDEDATA-B
Kendall's Tau10.28
8
Multimodal Preference EvaluationOID
P-Acc59.3
7
Multimodal Preference EvaluationImgREW
P-Acc57.8
7
Showing 9 of 9 rows

Other info

Follow for update