Presentation
in
Workshop: Workshop on Reinforcement Learning Theory
Invited Speaker: Qiaomin Xie: Reinforcement Learning for Zero-Sum Markov Games Using Function Approximation and Correlated Equilibrium
Qiaomin Xie
We consider on-policy reinforcement learning for two-player zero-sum Markov games with simultaneous moves, which generalizes both matrix games and Markov decision processes. To ensure exploration, our algorithm needs to construct upper and lower confidence bounds for both players in each round. We show that this can be done by finding the Coarse Correlated Equilibrium of an appropriate general-sum matrix game, which can be solved in polynomial time. The algorithm applies to the offline setting where one aims to find the Nash Equilibria of the Markov game, and the online setting where one plays against an arbitrary opponent and aims to minimize regret. For both settings, we provide sample complexity and regret guarantees.
Our results generalize to Markov games with continuous and unbounded state spaces. Using linear function approximation, we develop an optimistic version of least-squares minimax value iteration, for which we provide performance guarantees that only depend on the linear dimension. This setting highlights the analytical challenges that arise due to the non-Lipschitzness of CCEs of general-sum games.
Based on https://arxiv.org/abs/2002.07066 .