Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

V-CAGE: Vision-Closed-Loop Agentic Generation Engine for Robotic Manipulation

About

Scaling Vision-Language-Action (VLA) models requires massive datasets that are both semantically coherent and physically feasible. However, existing scene generation methods often lack context-awareness, making it difficult to synthesize high-fidelity environments embedded with rich semantic information, frequently resulting in unreachable target positions that cause tasks to fail prematurely. We present V-CAGE (Vision-Closed-loop Agentic Generation Engine), an agentic framework for autonomous robotic data synthesis. Unlike traditional scripted pipelines, V-CAGE operates as an embodied agentic system, leveraging foundation models to bridge high-level semantic reasoning with low-level physical interaction. Specifically, we introduce Inpainting-Guided Scene Construction to systematically arrange context-aware layouts, ensuring that the generated scenes are both semantically structured and kinematically reachable. To ensure trajectory correctness, we integrate functional metadata with a Vision-Language Model based closed-loop verification mechanism, acting as a visual critic to rigorously filter out silent failures and sever the error propagation chain. Finally, to overcome the storage bottleneck of massive video datasets, we implement a perceptually-driven compression algorithm that achieves over 90\% filesize reduction without compromising downstream VLA training efficacy. By centralizing semantic layout planning and visual self-verification, V-CAGE automates the end-to-end pipeline, enabling the highly scalable synthesis of diverse, high-quality robotic manipulation datasets.

Yaru Liu, Ao-bo Wang, Nanyang Ye• 2026

Related benchmarks

TaskDatasetResultRank
Long-horizon robotic manipulationAutoCheckout Synthesized--
3
Long-horizon robotic manipulationPackBreads Synthesized--
3
Long-horizon robotic manipulationPackStationery Synthesized--
3
Long-horizon robotic manipulationSortToCabinet Synthesized--
3
Showing 4 of 4 rows

Other info

Follow for update