Timezone: »

Spotlight
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games
Wei Xiong · Han Zhong · Chengshuai Shi · Cong Shen · Tong Zhang

Thu Jul 21 08:30 AM -- 08:35 AM (PDT) @ Room 310
Existing studies on provably efficient algorithms for Markov games (MGs) almost exclusively build on the optimism in the face of uncertainty'' (OFU) principle. This work focuses on a distinct approach of posterior sampling, which is celebrated in many bandits and reinforcement learning settings but remains under-explored for MGs. Specifically, for episodic two-player zero-sum MGs, a novel posterior sampling algorithm is developed with \emph{general} function approximation. Theoretical analysis demonstrates that the posterior sampling algorithm admits a $\sqrt{T}$-regret bound for problems with a low multi-agent decoupling coefficient, which is a new complexity measure for MGs, where $T$ denotes the number of episodes. When specializing to linear MGs, the obtained regret bound matches the state-of-the-art results. To the best of our knowledge, this is the first provably efficient posterior sampling algorithm for MGs with frequentist regret guarantees, which extends the toolbox for MGs and promotes the broad applicability of posterior sampling.

#### Author Information

##### Tong Zhang (HKUST)

Tong Zhang is a professor of Computer Science and Mathematics at the Hong Kong University of Science and Technology. His research interests are machine learning, big data and their applications. He obtained a BA in Mathematics and Computer Science from Cornell University, and a PhD in Computer Science from Stanford University. Before joining HKUST, Tong Zhang was a professor at Rutgers University, and worked previously at IBM, Yahoo as research scientists, Baidu as the director of Big Data Lab, and Tencent as the founding director of AI Lab. Tong Zhang was an ASA fellow and IMS fellow, and has served as the chair or area-chair in major machine learning conferences such as NIPS, ICML, and COLT, and has served as associate editors in top machine learning journals such as PAMI, JMLR, and Machine Learning Journal.