Skip to yearly menu bar Skip to main content


Learning While Playing in Mean-Field Games: Convergence and Optimality

Qiaomin Xie · Zhuoran Yang · Zhaoran Wang · Andreea Minca

Keywords: [ Reinforcement Learning and Planning ] [ Multi-Agent RL ] [ Algorithms ] [ Components Analysis (e.g., CCA, ICA, LDA, PCA) ] [ Privacy, Anonymity, and Security ]


We study reinforcement learning in mean-field games. To achieve the Nash equilibrium, which consists of a policy and a mean-field state, existing algorithms require obtaining the optimal policy while fixing any mean-field state. In practice, however, the policy and the mean-field state evolve simultaneously, as each agent is learning while playing. To bridge such a gap, we propose a fictitious play algorithm, which alternatively updates the policy (learning) and the mean-field state (playing) by one step of policy optimization and gradient descent, respectively. Despite the nonstationarity induced by such an alternating scheme, we prove that the proposed algorithm converges to the Nash equilibrium with an explicit convergence rate. To the best of our knowledge, it is the first provably efficient algorithm that achieves learning while playing via alternating updates.

Chat is not available.