OrbiSim: World Models as Differentiable Physics Engines for Embodied Intelligence
About
We present OrbiSim, a novel robotic simulation paradigm that redefines world models as a fully differentiable physics engine for embodied intelligence. Unlike prior world models that focus on unconstrained imagination in latent or visual domains, OrbiSim establishes a unified, physically-grounded pathway that bridges structured scene assets, neural dynamics, and downstream reinforcement learning. By enabling end-to-end differentiability throughout the entire simulation loop -- spanning from explicit state transitions to visual observation generation -- OrbiSim supports tasks traditionally intractable for classical simulators, such as differentiable contact modeling, gradient-based policy optimization under sparse rewards, and intuitive physical inference. Empirical results demonstrate that OrbiSim significantly outperforms state-of-the-art world models in both predictive fidelity and control performance. Furthermore, its consistent responsiveness to asset configurations and physical parameters suggests its potential as a differentiable tool for enhancing robot simulation and policy training.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Push | robosuite Push | Success Rate42.71 | 6 | |
| Video-level world modeling | robosuite Push | PSNR (10 steps)27.9346 | 6 | |
| World Modeling | Robosuite Push In-Distribution (test) | PSNR (10 frames)26.7105 | 4 | |
| World Modeling | Robosuite Push Out-of-Distribution (test) | PSNR (10 steps)27.1867 | 2 | |
| World Prediction | robosuite Extended (Push_cubes and Pick_and_place, up to 4 objects) | PSNR (10 steps)24.8676 | 2 |