Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought

About

Language instructions and demonstrations are two natural ways for users to teach robots personalized tasks. Recent progress in Large Language Models (LLMs) has shown impressive performance in translating language instructions into code for robotic tasks. However, translating demonstrations into task code continues to be a challenge due to the length and complexity of both demonstrations and code, making learning a direct mapping intractable. This paper presents Demo2Code, a novel framework that generates robot task code from demonstrations via an extended chain-of-thought and defines a common latent specification to connect the two. Our framework employs a robust two-stage process: (1) a recursive summarization technique that condenses demonstrations into concise specifications, and (2) a code synthesis approach that expands each function recursively from the generated specifications. We conduct extensive evaluation on various robot task benchmarks, including a novel game benchmark Robotouille, designed to simulate diverse cooking tasks in a kitchen environment. The project's website is available at https://portal-cornell.github.io/demo2code/

Huaxiaoyue Wang, Gonzalo Gonzalez-Pumariega, Yash Sharma, Sanjiban Choudhury• 2023

Related benchmarks

Task	Dataset	Result
Robotic manipulation task code generation	Tabletop Manipulation simulator	Execution Success Rate100	30
Robotic Manipulation	RLBench standard (test)	Reach Target Success Rate94	12
Cross-domain demo-to-code	Real-world demonstrations and deployment Medium-Complexity	Success Rate (SR)25	11
Cross-domain demo-to-code	Obstruction and Object affordance High-Complexity	SR22.5	7
Cross-domain demo-to-code	Kinematic configuration and Gripper type Medium-Complexity	Success Rate (SR)25	7
Cross-domain demo-to-code	Obstruction and Object affordance Low-Complexity	Success Rate (SR)26.67	7
Cross-domain demo-to-code	Kinematic configuration and Gripper type Low-Complexity	Success Rate (SR)33.33	7
Cross-domain demo-to-code	Kinematic configuration and Gripper type High-Complexity	Success Rate (SR)20	7
Cross-domain demo-to-code	Combination Factor Low-Complexity	Success Rate (SR)30	7
Cross-domain demo-to-code	Combination Factor Medium-Complexity	Success Rate (SR)20	7

Showing 10 of 20 rows

Other info

Code

Follow for update

@wizwand_team Discord