Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Value Iteration for Simple Stochastic Games: Stopping Criterion and Learning Algorithm

About

Simple stochastic games can be solved by value iteration (VI), which yields a sequence of under-approximations of the value of the game. This sequence is guaranteed to converge to the value only in the limit. Since no stopping criterion is known, this technique does not provide any guarantees on its results. We provide the first stopping criterion for VI on simple stochastic games. It is achieved by additionally computing a convergent sequence of over-approximations of the value, relying on an analysis of the game graph. Consequently, VI becomes an anytime algorithm returning the approximation of the value and the current error bound. As another consequence, we can provide a simulation-based asynchronous VI algorithm, which yields the same guarantees, but without necessarily exploring the whole game graph.

Edon Kelmendi, Julia Kr\"amer, Jan Kretinsky, Maximilian Weininger• 2018

Related benchmarks

TaskDatasetResultRank
Value Iterationcsma all before max
Iterations16
2
Value Iterationcsma all before min
Iterations16
2
Value Iterationcsma some before
Iterations6
2
Value IterationFirewire DL
Iterations5
2
Value IterationPacman
Iterations2
2
Value Iterationwlan_dl
Iterations333
2
Value Iterationconsensus c2
Iterations9.45e+3
2
Value Iterationconsensus disagree
Iterations1.09e+4
2
Value Iterationwlan
Iterations275
2
Value Iterationzeroconf correct max
Iterations136
2
Showing 10 of 13 rows

Other info

Follow for update