Revisiting Regularized Policy Optimization for Stable and Efficient Reinforcement Learning in Two-Player Games

About

Two-player games such as board games have long been used as traditional benchmarks for reinforcement learning. This work revisits a policy optimization method with reverse Kullback-Leibler regularization and entropy regularization and analyzes this combination in two-player zero-sum settings from theoretical and empirical perspectives. From a theoretical perspective, we investigate the stability of the policy update rule in two theoretical settings: game-theoretic normal-form games and finite-length games. We provide novel convergence guarantees and verify our theoretical results through numerical experiments on synthetic games. From an empirical perspective, we derive a practical model-free reinforcement learning algorithm based on the regularized policy optimization. We validate the training efficiency of our algorithm through comprehensive experiments on five board games: Animal Shogi, Gardner Chess, Go, Hex, and Othello. Experimental results show that our agent learns more efficiently than existing methods across environments.

Kazuki Ota, Takayuki Osa, Motoki Omura, Tatsuya Harada• 2026

Related benchmarks

Task	Dataset	Result	Rank
Go	Go 9x9 (head-to-head match)	Winrate100		5

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord