Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mollified Value Learning

About

Offline goal-conditioned reinforcement learning (GCRL) learns goal-reaching behaviors from static datasets, but accurate value estimation remains challenging under limited state-action coverage. Existing physics-informed approaches address this by imposing pointwise distance-like geometric constraints derived from Hamilton--Jacobi--Bellman (HJB) optimality principles, often through first-order partial differential equations such as the Eikonal equation. However, enforcing local consistency through explicit differential structure can become unstable in complex, high-dimensional environments. Our key insight is to instead reinterpret distance-like constraints as an expectation over a local spatial measure. By aggregating constraints over this measure rather than evaluating them pointwise, the objective acts as a spatial mollifier, inducing distance-like value geometry without requiring expensive differential operators. We refer to this as Mollified Value Learning (MVL). Experiments across navigation and high-dimensional robotic manipulation tasks show that MVL learns structured, value representations, improving goal-reaching performance, when used with implicit value representation learning methods. Open-source codes are available at https://github.com/HrishikeshVish/MVL.

Hrishikesh Viswanath, Juanwu Lu, S. Talha Bukhari, Mihir Chauhan, Damon Conover, Ziran Wang, Aniket Bera• 2026

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL Franka Kitchen
Mixed Success Rate84
43
Robotic ManipulationD4RL Kitchen-Partial
Normalized Score100
23
Robotic ManipulationD4RL Kitchen-Mixed--
14
Manipulationcube-double-play oraclerep v0
Task 1 Success Rate96
9
Manipulationscene-play oraclerep v0
Task 1 Success Rate98
9
Manipulationpuzzle 4x4-play-oraclerep v0
Task 1 Success Rate62
9
Navigationpointmaze large-navigate-oraclerep v0
Task 1 Success Rate100
9
Offline goal-conditioned RLOGBench Navigation
Success Rate (PointMaze-Medium)96
9
Offline goal-conditioned RLOGBench Manipulation
Success Rate (Cube Single)91
9
Manipulationcube-single-play-oraclerep v0
Task 1 Success Rate95
9
Showing 10 of 17 rows

Other info

Follow for update