Mollified Value Learning

About

Offline goal-conditioned reinforcement learning (GCRL) learns goal-reaching behaviors from static datasets, but accurate value estimation remains challenging under limited state-action coverage. Existing physics-informed approaches address this by imposing pointwise distance-like geometric constraints derived from Hamilton--Jacobi--Bellman (HJB) optimality principles, often through first-order partial differential equations such as the Eikonal equation. However, enforcing local consistency through explicit differential structure can become unstable in complex, high-dimensional environments. Our key insight is to instead reinterpret distance-like constraints as an expectation over a local spatial measure. By aggregating constraints over this measure rather than evaluating them pointwise, the objective acts as a spatial mollifier, inducing distance-like value geometry without requiring expensive differential operators. We refer to this as Mollified Value Learning (MVL). Experiments across navigation and high-dimensional robotic manipulation tasks show that MVL learns structured, value representations, improving goal-reaching performance, when used with implicit value representation learning methods. Open-source codes are available at https://github.com/HrishikeshVish/MVL.

Hrishikesh Viswanath, Juanwu Lu, S. Talha Bukhari, Mihir Chauhan, Damon Conover, Ziran Wang, Aniket Bera• 2026

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	D4RL Franka Kitchen	Mixed Success Rate84	43
Robotic Manipulation	D4RL Kitchen-Partial	Normalized Score100	23
Robotic Manipulation	D4RL Kitchen-Mixed	--	14
Manipulation	cube-double-play oraclerep v0	Task 1 Success Rate96	9
Manipulation	scene-play oraclerep v0	Task 1 Success Rate98	9
Manipulation	puzzle 4x4-play-oraclerep v0	Task 1 Success Rate62	9
Navigation	pointmaze large-navigate-oraclerep v0	Task 1 Success Rate100	9
Offline goal-conditioned RL	OGBench Navigation	Success Rate (PointMaze-Medium)96	9
Offline goal-conditioned RL	OGBench Manipulation	Success Rate (Cube Single)91	9
Manipulation	cube-single-play-oraclerep v0	Task 1 Success Rate95	9

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord