Accelerating Policy Synthesis in Large-Scale MDPs via Hierarchical Adaptive Refinement

About

Software-intensive systems, such as software product lines and robotics, utilise Markov decision processes (MDPs) to capture uncertainty and analyse sequential decision-making problems. Despite the usefulness of conventional policy synthesis methods, they fail to scale to large state spaces. Our approach addresses this issue and accelerates policy synthesis in large MDPs by dynamically refining the MDP and iteratively selecting the most fragile MDP regions for refinement. This iterative procedure offers a balance between accuracy and efficiency, as refinement occurs only when necessary. We formally show that the composed policy is near-optimal under standard assumptions, with error bounded by the local solver tolerance and boundary mismatch. Across diverse case studies and MDPs up to 1M states, we demonstrate that our approach achieves up to $2\times$ speedup over PRISM, offering a competitive solution for real-world policy synthesis in large MDPs.

Alexandros Evangelidis, Gricel V\'azquez, Simos Gerasimou• 2025

Related benchmarks

Task	Dataset	Result
Policy Synthesis	Warehouse MDPs Property: Rmin R{“steps”}min=? [F “goal”] W512/W1024	Time (s)12.49	30
Policy Synthesis	Warehouse MDPs Property: Pmax F “goal” W512 W1024	Time (s)7.61	30
Policy Synthesis	arenas Rmin 16	Execution Time (s)152.8	2
Policy Synthesis	arenas8 Rmin	Execution Time (s)2.94	2
Policy Synthesis	wlan3 Pmax	Time (s)5.28	2
Policy Synthesis	firewire Rmin	Execution Time (s)0.187	2
Policy Synthesis	consensus Pmax	Time (s)5.414	2
Policy Synthesis	zeroconf Rmin	Execution Time (s)0.224	2

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord