Neural Value Iteration
About
The value function of a POMDP exhibits the piecewise-linear-convex (PWLC) property and can be represented as a finite set of hyperplanes, known as $\alpha$-vectors. Most state-of-the-art POMDP solvers (offline planners) follow the point-based value iteration scheme, which performs Bellman backups on $\alpha$-vectors at reachable belief points until convergence. However, since each $\alpha$-vector is $|S|$-dimensional, these methods quickly become intractable for large-scale problems due to the prohibitive computational cost of Bellman backups. In this work, we demonstrate that the PWLC property allows a POMDP's value function to be alternatively represented as a finite set of neural networks. This insight enables a novel POMDP planning algorithm called \emph{Neural Value Iteration}, which combines the generalization capability of neural networks with the classical value iteration framework. Our approach achieves near-optimal solutions even in extremely large POMDPs that are intractable for existing offline solvers.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| POMDP Planning | RockSample (15, 15) | Expected Return13.6 | 19 | |
| POMDP Planning | RockSample (20, 20) | Expected Return12.31 | 10 | |
| POMDP Planning | RockSample (7, 8) | Expected Return18.95 | 5 | |
| POMDP Planning | RockSample (11, 11) | Expected Return17.51 | 5 | |
| POMDP Planning | Light Dark | Expected Return3.73 | 4 | |
| POMDP Planning | Lidar Roomba | Expected Return2.03 | 4 |