Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reinforcement Learning for Reachability: Guaranteeing Asymptotic Optimality

About

Reinforcement learning (RL) for reachability specifications is fundamental in sequential decision-making, yet theoretical guarantees remain less explored. A recent work achieves asymptotic convergence to optimal policies. However, this approach provides limited insight into convergence dynamics. In this work, we present an alternative approach that provides deeper theoretical insights into convergence. Our approach builds on PAC learning with assumptions. PAC learning guarantees near-optimal policies with high confidence in finite time but requires knowing internal MDP parameters like minimum transition probability. We argue that while these parameters are unknown in RL, they can be iteratively refined and estimated with increasing accuracy. By iteratively satisfying PAC conditions, we show that exact optimality can be achieved in the limit. Empirical evaluations on standard benchmarks validate our theoretical insights into convergence dynamics.

Amogh Palasamudram, Jakub Svoboda, Suguman Bansal, Krishnendu Chatterjee• 2026

Related benchmarks

TaskDatasetResultRank
MDP Policy ReachabilityConsensus 2
State Coverage (%)100
1
MDP Policy ReachabilityCSMA 2-2
State Coverage93.1
1
MDP Policy Reachabilityfirewire abst
Percent States Seen100
1
MDP Policy ReachabilityIJ 3
State Coverage (%)100
1
MDP Policy ReachabilityIJ 10
State Coverage100
1
MDP Policy ReachabilityPacman
State Coverage47.2
1
MDP Policy ReachabilityPhilosophers MDP 3
State Coverage46
1
MDP Policy ReachabilityRabin 3
State Coverage (%)3.9
1
MDP Policy ReachabilityZeroconf
Percent States Seen14.2
1
Showing 9 of 9 rows

Other info

Follow for update