Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation

About

We present a framework for perspective-aware reasoning in vision-language models (VLMs) through mental imagery simulation. Perspective-taking, the ability to perceive an environment or situation from an alternative viewpoint, is a key benchmark for human-level visual understanding, essential for environmental interaction and collaboration with autonomous agents. Despite advancements in spatial reasoning within VLMs, recent research has shown that modern VLMs significantly lack perspective-aware reasoning capabilities and exhibit a strong bias toward egocentric interpretations. To bridge the gap between VLMs and human perception, we focus on the role of mental imagery, where humans perceive the world through abstracted representations that facilitate perspective shifts. Motivated by this, we propose a framework for perspective-aware reasoning, named Abstract Perspective Change (APC), that effectively leverages vision foundation models, such as object detection, segmentation, and orientation estimation, to construct scene abstractions and enable perspective transformations. Our experiments on synthetic and real-image benchmarks, compared with various VLMs, demonstrate significant improvements in perspective-aware reasoning with our framework, further outperforming fine-tuned spatial reasoning models and novel-view-synthesis-based approaches.

Phillip Y. Lee, Jihyeon Je, Chanho Park, Mikaela Angelina Uy, Leonidas Guibas, Minhyuk Sung• 2025

Related benchmarks

TaskDatasetResultRank
Spatial ReasoningVSI-Bench--
24
3D Spatial Reasoning3DSRBench
Accuracy64
23
Allocentric Spatial Reasoning3DSRBench
Left/Right Acc77.94
19
Perspective-aware spatial reasoningCOMFORT Visual Illusions
Directional Accuracy (Left/Right)84.31
19
Allocentric Spatial ReasoningCOMFORT#
Left/Right Accuracy47.83
19
Egocentric Spatial ReasoningCOCOSPATIAL
Left/Right Accuracy49.92
19
Spatial ReasoningSTI-Bench
D-Measure Score32
18
Spatial ReasoningMetro-Spatial-QA
Measurement Accuracy45.4
15
Spatial Reasoning (Video)VSI-Bench
Accuracy62.5
14
Spatial ReasoningCOCO 2017 (val)
Alignment Accuracy80
12
Showing 10 of 13 rows

Other info

Follow for update