Timezone: »

Short Talk 5 - Near-Optimal Reinforcement Learning with Self-Play
Tiancheng Yu

Fri Jul 17 12:20 PM -- 12:35 PM (PDT) @

This paper considers the problem of designing optimal algorithms for reinforcement learning in two-player zero-sum games. We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision. In a tabular episodic Markov game with states, max-player actions and min-player actions, the best existing algorithm for finding an approximate Nash equilibrium requires steps of game playing, when only highlighting the dependency on . In contrast, the best existing lower bound scales as and has a significant gap from the upper bound. This paper closes this gap for the first time: we propose an optimistic variant of the \emph{Nash Q-learning} algorithm with sample complexity , and a new \emph{Nash V-learning} algorithm with sample complexity . The latter result matches the information-theoretic lower bound in all problem-dependent parameters except for a polynomial factor of the length of each episode. We complement our upper bounds with a computational hardness result for achieving sublinear regret when playing against adversarial opponents in Markov games.

Yu Bai, Chi Jin, Tiancheng Yu

Author Information

Tiancheng Yu (MIT)

I am a 2nd-year PhD student in MIT EECS, advised by Prof. Suvrit Sra. Before MIT, I was an undergraduate student in Tsinghua University, major in Electronic Engineering and minor in Statistics.

More from the Same Authors