Neural Value Iteration

About

The value function of a POMDP exhibits the piecewise-linear-convex (PWLC) property and can be represented as a finite set of hyperplanes, known as $\alpha$-vectors. Most state-of-the-art POMDP solvers (offline planners) follow the point-based value iteration scheme, which performs Bellman backups on $\alpha$-vectors at reachable belief points until convergence. However, since each $\alpha$-vector is $|S|$-dimensional, these methods quickly become intractable for large-scale problems due to the prohibitive computational cost of Bellman backups. In this work, we demonstrate that the PWLC property allows a POMDP's value function to be alternatively represented as a finite set of neural networks. This insight enables a novel POMDP planning algorithm called \emph{Neural Value Iteration}, which combines the generalization capability of neural networks with the classical value iteration framework. Our approach achieves near-optimal solutions even in extremely large POMDPs that are intractable for existing offline solvers.

Yang You, Ufuk \c{C}ak{\i}r, Alex Schutz, Nick Hawes• 2025

Related benchmarks

Task	Dataset	Result
POMDP Planning	RockSample (15, 15)	Expected Return13.6	19
POMDP Planning	RockSample (20, 20)	Expected Return12.31	10
POMDP Planning	RockSample (7, 8)	Expected Return18.95	5
POMDP Planning	RockSample (11, 11)	Expected Return17.51	5
POMDP Planning	Light Dark	Expected Return3.73	4
POMDP Planning	Lidar Roomba	Expected Return2.03	4

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord