Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation

About

Despite significant progress in robotics and embodied AI in recent years, deploying robots for long-horizon tasks remains a great challenge. Majority of prior arts adhere to an open-loop philosophy and lack real-time feedback, leading to error accumulation and undesirable robustness. A handful of approaches have endeavored to establish feedback mechanisms leveraging pixel-level differences or pre-trained visual representations, yet their efficacy and adaptability have been found to be constrained. Inspired by classic closed-loop control systems, we propose CLOVER, a closed-loop visuomotor control framework that incorporates feedback mechanisms to improve adaptive robotic control. CLOVER consists of a text-conditioned video diffusion model for generating visual plans as reference inputs, a measurable embedding space for accurate error quantification, and a feedback-driven controller that refines actions from feedback and initiates replans as needed. Our framework exhibits notable advancement in real-world robotic tasks and achieves state-of-the-art on CALVIN benchmark, improving by 8% over previous open-loop counterparts. Code and checkpoints are maintained at https://github.com/OpenDriveLab/CLOVER.

Qingwen Bu, Jia Zeng, Li Chen, Yanchao Yang, Guyue Zhou, Junchi Yan, Ping Luo, Heming Cui, Yi Ma, Hongyang Li• 2024

Related benchmarks

Task	Dataset	Result
Robotic Manipulation	Calvin ABCD→D	Avg Length3.53	130
Long-horizon robot manipulation	Calvin ABCD→D	Task 1 Completion Rate96	127
Robotic Manipulation	Calvin ABC->D	Task-1 Score96	71
Sequential Robotic Manipulation	CALVIN	Success Rate (1 task)96	63
Robot Manipulation	Calvin ABC->D	Average Successful Length3.53	62
Instruction-following robotic manipulation	CALVIN ABC→D (unseen environment D)	Success Rate (Length 1)96	29
Long-Horizon Multi-Task Language Control	CALVIN ABC→D (test)	Seq Success (1)96	13
Language-conditioned visuomotor control	CALVIN ABC→D (Zero-shot)	Completion Rate (Seq 1)96	8
Long-horizon robotic manipulation	AIRBOT Play real-world	Sub-task 1 Success Rate93.3	4
Stack two bowls	AIRBOT Play real-world	Success Rate86.7	4

Showing 10 of 11 rows

Other info

Code

Follow for update

@wizwand_team Discord