Value Iteration for Simple Stochastic Games: Stopping Criterion and Learning Algorithm

About

Simple stochastic games can be solved by value iteration (VI), which yields a sequence of under-approximations of the value of the game. This sequence is guaranteed to converge to the value only in the limit. Since no stopping criterion is known, this technique does not provide any guarantees on its results. We provide the first stopping criterion for VI on simple stochastic games. It is achieved by additionally computing a convergent sequence of over-approximations of the value, relying on an analysis of the game graph. Consequently, VI becomes an anytime algorithm returning the approximation of the value and the current error bound. As another consequence, we can provide a simulation-based asynchronous VI algorithm, which yields the same guarantees, but without necessarily exploring the whole game graph.

Edon Kelmendi, Julia Kr\"amer, Jan Kretinsky, Maximilian Weininger• 2018

Related benchmarks

Task	Dataset	Result
Value Iteration	csma all before max	Iterations16	2
Value Iteration	csma all before min	Iterations16	2
Value Iteration	csma some before	Iterations6	2
Value Iteration	Firewire DL	Iterations5	2
Value Iteration	Pacman	Iterations2	2
Value Iteration	wlan_dl	Iterations333	2
Value Iteration	consensus c2	Iterations9.45e+3	2
Value Iteration	consensus disagree	Iterations1.09e+4	2
Value Iteration	wlan	Iterations275	2
Value Iteration	zeroconf correct max	Iterations136	2

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord