Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding

About

Humans can effortlessly locate desired objects in cluttered environments, relying on a cognitive mechanism known as visual search to efficiently filter out irrelevant information and focus on task-related regions. Inspired by this process, we propose Dyfo (Dynamic Focus), a training-free dynamic focusing visual search method that enhances fine-grained visual understanding in large multimodal models (LMMs). Unlike existing approaches which require additional modules or data collection, Dyfo leverages a bidirectional interaction between LMMs and visual experts, using a Monte Carlo Tree Search (MCTS) algorithm to simulate human-like focus adjustments. This enables LMMs to focus on key visual regions while filtering out irrelevant content, without introducing additional training caused by vocabulary expansion or the integration of specialized localization modules. Experimental results demonstrate that Dyfo significantly improves fine-grained visual understanding and reduces hallucination issues in LMMs, achieving superior performance across both fixed and dynamic resolution models. The code is available at https://github.com/PKU-ICST-MIPL/DyFo_CVPR2025

Geng Li, Jinglin Xu, Yunzhen Zhao, Yuxin Peng• 2025

Related benchmarks

TaskDatasetResultRank
Object Hallucination EvaluationPOPE
Accuracy84.8
1455
Science Question AnsweringScienceQA (SQA)
Accuracy71.6
273
Visual Grounded ReasoningTreeBench
Overall Score39.3
128
Visual Question AnsweringV*Bench
Accuracy81.2
84
High-resolution Visual UnderstandingHR-Bench-8K
FSP71.5
73
High-resolution perceptionHR-Bench-4K
Overall Score67.88
44
Visual Perception and ReasoningV*Bench
Attribute Score86.09
41
Visually Grounded ReasoningV*Bench
Average Accuracy84.3
32
Multimodal UnderstandingMMStar
Average Score34.6
31
Visual ReasoningV*
Overall Score81.2
22
Showing 10 of 15 rows

Other info

Follow for update