Mollified Value Learning
About
Offline goal-conditioned reinforcement learning (GCRL) learns goal-reaching behaviors from static datasets, but accurate value estimation remains challenging under limited state-action coverage. Existing physics-informed approaches address this by imposing pointwise distance-like geometric constraints derived from Hamilton--Jacobi--Bellman (HJB) optimality principles, often through first-order partial differential equations such as the Eikonal equation. However, enforcing local consistency through explicit differential structure can become unstable in complex, high-dimensional environments. Our key insight is to instead reinterpret distance-like constraints as an expectation over a local spatial measure. By aggregating constraints over this measure rather than evaluating them pointwise, the objective acts as a spatial mollifier, inducing distance-like value geometry without requiring expensive differential operators. We refer to this as Mollified Value Learning (MVL). Experiments across navigation and high-dimensional robotic manipulation tasks show that MVL learns structured, value representations, improving goal-reaching performance, when used with implicit value representation learning methods. Open-source codes are available at https://github.com/HrishikeshVish/MVL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Offline Reinforcement Learning | D4RL Franka Kitchen | Mixed Success Rate84 | 43 | |
| Robotic Manipulation | D4RL Kitchen-Partial | Normalized Score100 | 23 | |
| Robotic Manipulation | D4RL Kitchen-Mixed | -- | 14 | |
| Manipulation | cube-double-play oraclerep v0 | Task 1 Success Rate96 | 9 | |
| Manipulation | scene-play oraclerep v0 | Task 1 Success Rate98 | 9 | |
| Manipulation | puzzle 4x4-play-oraclerep v0 | Task 1 Success Rate62 | 9 | |
| Navigation | pointmaze large-navigate-oraclerep v0 | Task 1 Success Rate100 | 9 | |
| Offline goal-conditioned RL | OGBench Navigation | Success Rate (PointMaze-Medium)96 | 9 | |
| Offline goal-conditioned RL | OGBench Manipulation | Success Rate (Cube Single)91 | 9 | |
| Manipulation | cube-single-play-oraclerep v0 | Task 1 Success Rate95 | 9 |