Differentiable Arbitrating in Zero-sum Markov Games

About

We initiate the study of how to perturb the reward in a zero-sum Markov game with two players to induce a desirable Nash equilibrium, namely arbitrating. Such a problem admits a bi-level optimization formulation. The lower level requires solving the Nash equilibrium under a given reward function, which makes the overall problem challenging to optimize in an end-to-end way. We propose a backpropagation scheme that differentiates through the Nash equilibrium, which provides the gradient feedback for the upper level. In particular, our method only requires a black-box solver for the (regularized) Nash equilibrium (NE). We develop the convergence analysis for the proposed framework with proper black-box NE solvers and demonstrate the empirical successes in two multi-agent reinforcement learning (MARL) environments.

Jing Wang, Meichen Song, Feng Gao, Boyi Liu, Zhaoran Wang, Yi Wu• 2023

Related benchmarks

Task	Dataset	Result	Rank
Bilevel Reinforcement Learning	Bilevel Optimization over Saddle Points LL Problem: Min-Max	Iteration Complexity1		3

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord