Skip to yearly menu bar Skip to main content


Talk
in
Workshop: Theoretical Foundations of Reinforcement Learning

Short Talk 5 - Near-Optimal Reinforcement Learning with Self-Play

Tiancheng Yu


Abstract:

This paper considers the problem of designing optimal algorithms for reinforcement learning in two-player zero-sum games. We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision. In a tabular episodic Markov game with states, max-player actions and min-player actions, the best existing algorithm for finding an approximate Nash equilibrium requires steps of game playing, when only highlighting the dependency on . In contrast, the best existing lower bound scales as and has a significant gap from the upper bound. This paper closes this gap for the first time: we propose an optimistic variant of the \emph{Nash Q-learning} algorithm with sample complexity , and a new \emph{Nash V-learning} algorithm with sample complexity . The latter result matches the information-theoretic lower bound in all problem-dependent parameters except for a polynomial factor of the length of each episode. We complement our upper bounds with a computational hardness result for achieving sublinear regret when playing against adversarial opponents in Markov games.

Yu Bai, Chi Jin, Tiancheng Yu

Chat is not available.