Timezone: »

Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning
Yulai Zhao · Zhuoran Yang · Zhaoran Wang · Jason Lee

Wed Jul 26 05:00 PM -- 06:30 PM (PDT) @ Exhibit Hall 1 #233

Policy optimization methods with function approximation are widely used in multi-agent reinforcement learning. However, it remains elusive how to design such algorithms with statistical guarantees. Leveraging a multi-agent performance difference lemma that characterizes the landscape of multi-agent policy optimization, we find that the localized action value function serves as an ideal descent direction for each local policy. Motivated by the observation, we present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO. We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate. We extend our algorithm to the off-policy setting and introduce pessimism to policy evaluation, which aligns with experiments. To our knowledge, this is the first provably convergent multi-agent PPO algorithm in cooperative Markov games.

Author Information

Yulai Zhao (Princeton University)
Yulai Zhao

I’m a Ph.D. student at Princeton with a focus on machine learning. I’m fortunate to be co-advised by Professor S. Y. Kung and Professor Jason D. Lee. My research interests lie in theories/experiments of modern machine learning. I am also eager to investigate the infinite potential of modern data-driven approaches in practice.

Zhuoran Yang (Yale University)
Zhaoran Wang (Northwestern University)
Jason Lee (Princeton University)

More from the Same Authors