Timezone: »
Game optimization has been extensively studied when decision variables lie in a finite-dimensional space, of which solutions correspond to pure strategies at the Nash equilibrium (NE), and the gradient descent-ascent (GDA) method works widely in practice. In this paper, we consider infinite-dimensional zero-sum games by a min-max distributional optimization problem over a space of probability measures defined on a continuous variable set, which is inspired by finding a mixed NE for finite-dimensional zero-sum games. We then aim to answer the following question: \textit{Will GDA-type algorithms still be provably efficient when extended to infinite-dimensional zero-sum games?} To answer this question, we propose a particle-based variational transport algorithm based on GDA in the functional spaces. Specifically, the algorithm performs multi-step functional gradient descent-ascent in the Wasserstein space via pushing two sets of particles in the variable space. By characterizing the gradient estimation error from variational form maximization and the convergence behavior of each player with different objective landscapes, we prove rigorously that the generalized GDA algorithm converges to the NE or the value of the game efficiently for a class of games under the Polyak-\L ojasiewicz (PL) condition. To conclude, we provide complete statistical and convergence guarantees for solving an infinite-dimensional zero-sum game via a provably efficient particle-based method. Additionally, our work provides the first thorough statistical analysis for the particle-based algorithm to learn an objective functional with a variational form using universal approximators (\textit{i.e.}, neural networks (NNs)), which is of independent interest.
Author Information
Lewis Liu (Mila & DIRO)
Yufeng Zhang (Northwestern University)
Zhuoran Yang (Princeton)
Reza Babanezhad (Samsung)
Zhaoran Wang (Northwestern University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Infinite-Dimensional Optimization for Zero-Sum Games via Variational Transport »
Thu. Jul 22nd 04:00 -- 06:00 AM Room
More from the Same Authors
-
2021 : Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 : Randomized Least Squares Policy Optimization »
Haque Ishfaq · Zhuoran Yang · Andrei Lupu · Viet Nguyen · Lewis Liu · Riashat Islam · Zhaoran Wang · Doina Precup -
2021 : Is Pessimism Provably Efficient for Offline RL? »
Ying Jin · Zhuoran Yang · Zhaoran Wang -
2023 : Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees »
Sharan Vaswani · Amirreza Kazemi · Reza Babanezhad · Nicolas Le Roux -
2023 Poster: Fast Online Node Labeling for Very Large Graphs »
Baojian Zhou · Yifan Sun · Reza Babanezhad -
2023 Poster: Target-based Surrogates for Stochastic Optimization »
Jonathan Lavington · Sharan Vaswani · Reza Babanezhad · Mark Schmidt · Nicolas Le Roux -
2023 Poster: Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning »
Yulai Zhao · Zhuoran Yang · Zhaoran Wang · Jason Lee -
2023 Poster: Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments »
Yixuan Wang · Simon Zhan · Ruochen Jiao · Zhilu Wang · Wanxin Jin · Zhuoran Yang · Zhaoran Wang · Chao Huang · Qi Zhu -
2023 Poster: Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics »
Shenao Zhang · Wanxin Jin · Zhaoran Wang -
2023 Poster: Achieving Hierarchy-Free Approximation for Bilevel Programs with Equilibrium Constraints »
Jiayang Li · Jing Yu · Boyi Liu · Yu Nie · Zhaoran Wang -
2022 Poster: Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation »
ZHIHAN LIU · Yufeng Zhang · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes »
Hongyi Guo · Qi Cai · Yufeng Zhang · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets »
Han Zhong · Wei Xiong · Jiyuan Tan · Liwei Wang · Tong Zhang · Zhaoran Wang · Zhuoran Yang -
2022 Spotlight: Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes »
Hongyi Guo · Qi Cai · Yufeng Zhang · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation »
ZHIHAN LIU · Yufeng Zhang · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets »
Han Zhong · Wei Xiong · Jiyuan Tan · Liwei Wang · Tong Zhang · Zhaoran Wang · Zhuoran Yang -
2022 Poster: Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent »
Sharan Vaswani · Benjamin Dubois-Taine · Reza Babanezhad -
2022 Poster: Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency »
Qi Cai · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Adaptive Model Design for Markov Decision Process »
Siyu Chen · Donglin Yang · Jiayang Li · Senmiao Wang · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Adaptive Model Design for Markov Decision Process »
Siyu Chen · Donglin Yang · Jiayang Li · Senmiao Wang · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency »
Qi Cai · Zhuoran Yang · Zhaoran Wang -
2022 Oral: Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent »
Sharan Vaswani · Benjamin Dubois-Taine · Reza Babanezhad -
2022 Poster: Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning »
Boxiang Lyu · Zhaoran Wang · Mladen Kolar · Zhuoran Yang -
2022 Poster: Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning »
Shuang Qiu · Lingxiao Wang · Chenjia Bai · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy »
ZHIHAN LIU · Lu Miao · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Poster: Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation »
Xiaoyu Chen · Han Zhong · Zhuoran Yang · Zhaoran Wang · Liwei Wang -
2022 Spotlight: Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy »
ZHIHAN LIU · Lu Miao · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Spotlight: Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning »
Boxiang Lyu · Zhaoran Wang · Mladen Kolar · Zhuoran Yang -
2022 Spotlight: Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation »
Xiaoyu Chen · Han Zhong · Zhuoran Yang · Zhaoran Wang · Liwei Wang -
2022 Spotlight: Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning »
Shuang Qiu · Lingxiao Wang · Chenjia Bai · Zhuoran Yang · Zhaoran Wang -
2021 Poster: Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games »
Hongyi Guo · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2021 Spotlight: Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games »
Hongyi Guo · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2021 Poster: Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin LIANG -
2021 Poster: Randomized Exploration in Reinforcement Learning with General Value Function Approximation »
Haque Ishfaq · Qiwen Cui · Viet Nguyen · Alex Ayoub · Zhuoran Yang · Zhaoran Wang · Doina Precup · Lin Yang -
2021 Spotlight: Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin LIANG -
2021 Spotlight: Randomized Exploration in Reinforcement Learning with General Value Function Approximation »
Haque Ishfaq · Qiwen Cui · Viet Nguyen · Alex Ayoub · Zhuoran Yang · Zhaoran Wang · Doina Precup · Lin Yang -
2021 Poster: Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions »
Shuang Qiu · Xiaohan Wei · Jieping Ye · Zhaoran Wang · Zhuoran Yang -
2021 Poster: On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game »
Shuang Qiu · Jieping Ye · Zhaoran Wang · Zhuoran Yang -
2021 Poster: Principled Exploration via Optimistic Bootstrapping and Backward Induction »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 Oral: On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game »
Shuang Qiu · Jieping Ye · Zhaoran Wang · Zhuoran Yang -
2021 Spotlight: Principled Exploration via Optimistic Bootstrapping and Backward Induction »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 Oral: Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions »
Shuang Qiu · Xiaohan Wei · Jieping Ye · Zhaoran Wang · Zhuoran Yang -
2021 Poster: Learning While Playing in Mean-Field Games: Convergence and Optimality »
Qiaomin Xie · Zhuoran Yang · Zhaoran Wang · Andreea Minca -
2021 Poster: Affine Invariant Analysis of Frank-Wolfe on Strongly Convex Sets »
Thomas Kerdreux · Lewis Liu · Simon Lacoste-Julien · Damien Scieur -
2021 Poster: Is Pessimism Provably Efficient for Offline RL? »
Ying Jin · Zhuoran Yang · Zhaoran Wang -
2021 Spotlight: Affine Invariant Analysis of Frank-Wolfe on Strongly Convex Sets »
Thomas Kerdreux · Lewis Liu · Simon Lacoste-Julien · Damien Scieur -
2021 Spotlight: Is Pessimism Provably Efficient for Offline RL? »
Ying Jin · Zhuoran Yang · Zhaoran Wang -
2021 Spotlight: Learning While Playing in Mean-Field Games: Convergence and Optimality »
Qiaomin Xie · Zhuoran Yang · Zhaoran Wang · Andreea Minca -
2021 Poster: Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach »
Yingjie Fei · Zhuoran Yang · Zhaoran Wang -
2021 Oral: Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach »
Yingjie Fei · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning »
Lingxiao Wang · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Generative Adversarial Imitation Learning with Neural Network Parameterization: Global Optimality and Convergence Rate »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Provably Efficient Exploration in Policy Optimization »
Qi Cai · Zhuoran Yang · Chi Jin · Zhaoran Wang -
2020 Poster: On the Global Optimality of Model-Agnostic Meta-Learning »
Lingxiao Wang · Qi Cai · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Semiparametric Nonlinear Bipartite Graph Representation Learning with Provable Guarantees »
Sen Na · Yuwei Luo · Zhuoran Yang · Zhaoran Wang · Mladen Kolar -
2019 Poster: On the statistical rate of nonlinear recovery in generative models with heavy-tailed data »
Xiaohan Wei · Zhuoran Yang · Zhaoran Wang -
2019 Oral: On the statistical rate of nonlinear recovery in generative models with heavy-tailed data »
Xiaohan Wei · Zhuoran Yang · Zhaoran Wang -
2018 Poster: The Edge Density Barrier: Computational-Statistical Tradeoffs in Combinatorial Inference »
Hao Lu · Yuan Cao · Junwei Lu · Han Liu · Zhaoran Wang -
2018 Oral: The Edge Density Barrier: Computational-Statistical Tradeoffs in Combinatorial Inference »
Hao Lu · Yuan Cao · Junwei Lu · Han Liu · Zhaoran Wang