Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought
About
Language instructions and demonstrations are two natural ways for users to teach robots personalized tasks. Recent progress in Large Language Models (LLMs) has shown impressive performance in translating language instructions into code for robotic tasks. However, translating demonstrations into task code continues to be a challenge due to the length and complexity of both demonstrations and code, making learning a direct mapping intractable. This paper presents Demo2Code, a novel framework that generates robot task code from demonstrations via an extended chain-of-thought and defines a common latent specification to connect the two. Our framework employs a robust two-stage process: (1) a recursive summarization technique that condenses demonstrations into concise specifications, and (2) a code synthesis approach that expands each function recursively from the generated specifications. We conduct extensive evaluation on various robot task benchmarks, including a novel game benchmark Robotouille, designed to simulate diverse cooking tasks in a kitchen environment. The project's website is available at https://portal-cornell.github.io/demo2code/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robotic manipulation task code generation | Tabletop Manipulation simulator | Execution Success Rate100 | 30 | |
| Robotic Manipulation | RLBench standard (test) | Reach Target Success Rate94 | 12 | |
| Cross-domain demo-to-code | Real-world demonstrations and deployment Medium-Complexity | Success Rate (SR)25 | 11 | |
| Cross-domain demo-to-code | Obstruction and Object affordance High-Complexity | SR22.5 | 7 | |
| Cross-domain demo-to-code | Kinematic configuration and Gripper type Medium-Complexity | Success Rate (SR)25 | 7 | |
| Cross-domain demo-to-code | Obstruction and Object affordance Low-Complexity | Success Rate (SR)26.67 | 7 | |
| Cross-domain demo-to-code | Kinematic configuration and Gripper type Low-Complexity | Success Rate (SR)33.33 | 7 | |
| Cross-domain demo-to-code | Kinematic configuration and Gripper type High-Complexity | Success Rate (SR)20 | 7 | |
| Cross-domain demo-to-code | Combination Factor Low-Complexity | Success Rate (SR)30 | 7 | |
| Cross-domain demo-to-code | Combination Factor Medium-Complexity | Success Rate (SR)20 | 7 |