Data Analogies Enable Efficient Cross-Embodiment Transfer
About
Generalist robot policies are trained on demonstrations collected across a wide variety of robots, scenes, and viewpoints. Yet it remains unclear how to best organize and scale such heterogeneous data so that it genuinely improves performance in a given target setting. In this work, we ask: what form of demonstration data is most useful for enabling transfer across robot set-ups? We conduct controlled experiments that vary end-effector morphology, robot platform appearance, and camera perspective, and compare the effects of simply scaling the number of demonstrations against systematically broadening the diversity in different ways. Our simulated experiments show that while perceptual shifts such as viewpoint benefit most from broad diversity, morphology shifts benefit far less from unstructured diversity and instead see the largest gains from data analogies, i.e. paired demonstrations that align scenes, tasks, and/or trajectories across different embodiments. Informed by the simulation results, we improve real-world cross-embodiment transfer success by an average of $22.5\%$ over large-scale, unpaired datasets by changing only the composition of the data.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Book on Bookshelf | Real-world PiperX→WidowX (test) | Success Rate65 | 4 | |
| Book on Bookshelf | Real-world WidowX→Franka (test) | Success Rate60 | 4 | |
| Book on Bookshelf | Real-world WidowX→Piper (test) | Success Rate65 | 4 | |
| Flip Mug | Simulation Panda robot | Success Rate68 | 4 | |
| Flip Mug | Simulation Jaco robot | Success Rate62 | 4 | |
| Open cabinet | Simulation Panda robot | Success Rate55 | 4 | |
| Open cabinet | Simulation Jaco robot | Success Rate56 | 4 | |
| Pen in Cup | Real-world PiperX→WidowX (test) | Success Rate85 | 4 | |
| Pen in Cup | Real-world WidowX→Franka (test) | Success Rate0.75 | 4 | |
| Pen in Cup | Real-world WidowX→Piper (test) | Success Rate90 | 4 |