Differentiable Robot Rendering
About
Vision foundation models trained on massive amounts of visual data have shown unprecedented reasoning and planning skills in open-world settings. A key challenge in applying them to robotic tasks is the modality gap between visual data and action data. We introduce differentiable robot rendering, a method allowing the visual appearance of a robot body to be directly differentiable with respect to its control parameters. Our model integrates a kinematics-aware deformable model and Gaussians Splatting and is compatible with any robot form factors and degrees of freedom. We demonstrate its capability and usage in applications including reconstruction of robot poses from images and controlling robots through vision language models. Quantitative and qualitative results show that our differentiable rendering model provides effective gradients for robotic control directly from pixels, setting the foundation for the future applications of vision foundation models in robotics.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual-goal pose reconstruction | Franka Robot Environment | Success Rate92 | 12 | |
| Visual-goal pose reconstruction | Fetch Robot Environment | Success Rate (%)84 | 12 | |
| Visual-goal pose reconstruction | UR5e Robot Environment | Success Rate80 | 12 | |
| Articulated Object Reconstruction | Robot 2 Arm Airbot Play | mIoU57.43 | 4 | |
| Pose Reconstruction | Panda-3CAM-Azure | Joint 1 Error (J1 Error)0.077 | 4 | |
| Visual-goal motion planning | Fetch 2.5 rad bin 1.0 (test) | Success Rate (SR)1 | 4 | |
| Articulated Object Reconstruction | Furniture 21 IKEA Cabinet | IoU57.36 | 4 | |
| Articulated Object Reconstruction | Robot 1 Hand Xhand | IoU28.53 | 4 | |
| Articulated Object Reconstruction | Furniture IKEA Cabinet 09 | IoU35.84 | 4 | |
| Visual-goal motion planning | Franka 0.5 rad bin 1.0 (test) | Success Rate (SR)59.8 | 4 |