Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games

Hongyi Guo · Zuyue Fu · Zhuoran Yang · Zhaoran Wang

Keywords: [ Reinforcement Learning and Planning ]


We study the global convergence and global optimality of the actor-critic algorithm applied for the zero-sum two-player stochastic games in a decentralized manner. We focus on the single-timescale setting where the critic is updated by applying the Bellman operator only once and the actor is updated by policy gradient with the information from the critic. Our algorithm is in a decentralized manner, as we assume that each player has no access to the actions of the other one, which, in a way, protects the privacy of both players. Moreover, we consider linear function approximations for both actor and critic, and we prove that the sequence of joint policy generated by our decentralized linear algorithm converges to the minimax equilibrium at a sublinear rate (\cO(\sqrt{K})), where (K) is the number of iterations. To the best of our knowledge, we establish the global optimality and convergence of decentralized actor-critic algorithm on zero-sum two-player stochastic games with linear function approximations for the first time.

Chat is not available.