Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Neural Value Iteration

About

The value function of a POMDP exhibits the piecewise-linear-convex (PWLC) property and can be represented as a finite set of hyperplanes, known as $\alpha$-vectors. Most state-of-the-art POMDP solvers (offline planners) follow the point-based value iteration scheme, which performs Bellman backups on $\alpha$-vectors at reachable belief points until convergence. However, since each $\alpha$-vector is $|S|$-dimensional, these methods quickly become intractable for large-scale problems due to the prohibitive computational cost of Bellman backups. In this work, we demonstrate that the PWLC property allows a POMDP's value function to be alternatively represented as a finite set of neural networks. This insight enables a novel POMDP planning algorithm called \emph{Neural Value Iteration}, which combines the generalization capability of neural networks with the classical value iteration framework. Our approach achieves near-optimal solutions even in extremely large POMDPs that are intractable for existing offline solvers.

Yang You, Ufuk \c{C}ak{\i}r, Alex Schutz, Nick Hawes• 2025

Related benchmarks

TaskDatasetResultRank
POMDP PlanningRockSample (15, 15)
Expected Return13.6
19
POMDP PlanningRockSample (20, 20)
Expected Return12.31
10
POMDP PlanningRockSample (7, 8)
Expected Return18.95
5
POMDP PlanningRockSample (11, 11)
Expected Return17.51
5
POMDP PlanningLight Dark
Expected Return3.73
4
POMDP PlanningLidar Roomba
Expected Return2.03
4
Showing 6 of 6 rows

Other info

Follow for update