Online Risk-Averse Planning in POMDPs Using Iterated CVaR Value Function

About

We study risk-sensitive planning under partial observability using the dynamic risk measure Iterated Conditional Value-at-Risk (ICVaR). A policy evaluation algorithm for ICVaR is developed with finite-time performance guarantees that do not depend on the cardinality of the action space. Building on this foundation, three widely used online planning algorithms--Sparse Sampling, Particle Filter Trees with Double Progressive Widening (PFT-DPW), and Partially Observable Monte Carlo Planning with Observation Widening (POMCPOW)--are extended to optimize the ICVaR value function rather than the expectation of the return. Our formulations introduce a risk parameter $\alpha$, where $\alpha = 1$ recovers standard expectation-based planning and $\alpha < 1$ induces increasing risk aversion. For ICVaR Sparse Sampling, we establish finite-time performance guarantees under the risk-sensitive objective, which further enable a novel exploration strategy tailored to ICVaR. Experiments on benchmark POMDP domains demonstrate that the proposed ICVaR planners achieve lower tail risk compared to their risk-neutral counterparts.

Yaacov Pariente, Vadim Indelman• 2026

Related benchmarks

Task	Dataset	Result	Rank
POMDP Planning	LaserTag (D,D,C) (test)	ICVaR12.47		4
POMDP Planning	LightDark (C,C,C) (test)	ICVaR16.72		4

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord