PISTO: Proximal Inference for Stochastic Trajectory Optimization
About
Stochastic trajectory optimization methods like STOMP enable planning with non-differentiable costs, offering substantial flexibility over gradient-based approaches. We show that STOMP implicitly minimizes the KL divergence from a Boltzmann trajectory distribution, revealing an elegant Variational Inference (VI) structure underlying its updates. Building on this insight, we propose the \textit{Proximal Inference for Stochastic Trajectory Optimization} (PISTO) algorithm that stabilizes the updates by augmenting the objective with a KL regularization between successive Gaussian proposals. This proximal formulation admits a trust-region interpretation and yields closed-form mean updates computable as expectations under a surrogate distribution. We estimate these expectations via importance-weighted Monte Carlo sampling, producing a simple, derivative-free algorithm that inherits STOMP's ability to handle non-differentiable and discontinuous costs without modification. On robot arm motion planning benchmarks, PISTO achieves an 89\% success rate -- outperforming CHOMP (63\%) and STOMP (68\%) -- while producing shorter, smoother paths at twice the speed of competing stochastic methods. We further validate PISTO on contact-rich MuJoCo locomotion and manipulation tasks, where it consistently outperforms both CEM and MPPI baselines in reward.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Trajectory Optimization | Push T | Time (s)134.6 | 8 | |
| Trajectory Optimization | Walker2D | Computational Time (s)65.99 | 8 | |
| Trajectory Optimization | Humanoid Standup | Computational Time (s)65.29 | 8 | |
| Motion Planning | 7-DOF Manipulator | Success Rate88.57 | 4 | |
| Trajectory Optimization | Hopper | Reward (Per Step)1.2645 | 3 | |
| Trajectory Optimization | HumanoidRun | Cumulative Reward per Step1.3385 | 3 |