Timezone: »

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions
Shuang Qiu · Xiaohan Wei · Jieping Ye · Zhaoran Wang · Zhuoran Yang

Wed Jul 21 06:00 AM -- 06:20 AM (PDT) @
While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for two-player zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight $\widetilde{\mathcal{O}}(\sqrt{T})$ regret bounds after $T$ steps in a two-agent competitive game scenario. The regret of each player is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment. When both players adopt the proposed algorithms, their overall optimality gap is $\widetilde{\mathcal{O}}(\sqrt{T})$.

Author Information

Shuang Qiu (University of Michigan)
Xiaohan Wei (Facebook)
Jieping Ye (University of Michigan)
Zhaoran Wang (Northwestern University)
Zhuoran Yang (Princeton University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors