Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning

About

Vision-as-inverse-graphics, the concept of reconstructing an image as an editable graphics program is a long-standing goal of computer vision. Yet even strong VLMs aren't able to achieve this in one-shot as they lack fine-grained spatial and physical grounding capability. Our key insight is that closing this gap requires interleaved multimodal reasoning through iterative execution and verification. Stemming from this, we present VIGA (Vision-as-Inverse-Graphic Agent) that starts from an empty world and reconstructs or edits scenes through a closed-loop write-run-render-compare-revise procedure. To support long-horizon reasoning, VIGA combines (i) a skill library that alternates generator and verifier roles and (ii) an evolving context memory that contains plans, code diffs, and render history. VIGA is task-agnostic as it doesn't require auxiliary modules, covering a wide range of tasks such as 3D reconstruction, multi-step scene editing, 4D physical interaction, and 2D document editing, etc. Empirically, we found VIGA substantially improves one-shot baselines on BlenderGym (35.32%) and SlideBench (117.17%). Moreover, VIGA is also model-agnostic as it doesn't require finetuning, enabling a unified protocol to evaluate heterogeneous foundation VLMs. To better support this protocol, we introduce BlenderBench, a challenging benchmark that stress-tests interleaved multimodal reasoning with graphics engine, where VIGA improves by 124.70%.

Shaofeng Yin, Jiaxin Ge, Zora Zhiruo Wang, Xiuyu Li, Michael J. Black, Trevor Darrell, Angjoo Kanazawa, Haiwen Feng• 2026

Related benchmarks

TaskDatasetResultRank
3D Graphic EditingBlenderGym
PL (Blend Shape)13.51
18
Camera AdjustmentBlenderBench
PL0.6082
10
Multi-step EditingBlenderBench
PL33.14
10
Compositional EditingBlenderBench
PL30.14
10
2D Slide GenerationSlideBench
Execution Score95
8
Overall EvaluationBlenderBench
Improvement159.2
8
Task 1BlenderBench
PL60.82
8
Task 2BlenderBench
PL33.14
8
Task 3BlenderBench
PL8.98
8
Showing 9 of 9 rows

Other info

Follow for update