Timezone: »
Learning from repeated play in a fixed two-player zero-sum game is a classic problem in game theory and online learning. We consider a variant of this problem where the game payoff matrix changes over time, possibly in an adversarial manner. We first present three performance measures to guide the algorithmic design for this problem: 1) the well-studied \emph{individual regret}, 2) an extension of \emph{duality gap}, and 3) a new measure called \emph{dynamic Nash Equilibrium regret}, which quantifies the cumulative difference between the player's payoff and the minimax game value. Next, we develop a single parameter-free algorithm that \emph{simultaneously} enjoys favorable guarantees under all these three performance measures. These guarantees are adaptive to different non-stationarity measures of the payoff matrices and, importantly, recover the best known results when the payoff matrix is fixed. Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property, along with several novel ingredients specifically designed for the time-varying game setting. Empirical results further validate the effectiveness of our algorithm.
Author Information
Mengxiao Zhang (University of Southern California)
Peng Zhao (Nanjing University)
Haipeng Luo (University of Southern California)
Zhi-Hua Zhou (Nanjing University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: No-Regret Learning in Time-Varying Zero-Sum Games »
Wed. Jul 20th 05:35 -- 05:40 PM Room Ballroom 3 & 4
More from the Same Authors
-
2021 : Online Learning for Stochastic Shortest Path Model via Posterior Sampling »
Mehdi Jafarnia · Liyu Chen · Rahul Jain · Haipeng Luo -
2021 : The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition »
Tiancheng Jin · Longbo Huang · Haipeng Luo -
2021 : Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses »
Haipeng Luo · Chen-Yu Wei · Chung-Wei Lee -
2021 : Implicit Finite-Horizon Approximation for Stochastic Shortest Path »
Liyu Chen · Mehdi Jafarnia · Rahul Jain · Haipeng Luo -
2022 : Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2022 : Optimal Rates of (Locally) Differentially Private Heavy-tailed Multi-Armed Bandits »
Yulian Wu · Youming Tao · Peng Zhao · Di Wang -
2023 : Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games »
Yang Cai · Haipeng Luo · Chen-Yu Wei · Weiqiang Zheng -
2023 Poster: Fast Rates in Time-Varying Strongly Monotone Games »
Yu-Hu Yan · Peng Zhao · Zhi-Hua Zhou -
2023 Poster: Estimating Possible Causal Effects with Latent Variables via Adjustment »
Tian-Zuo Wang · Tian Qin · Zhi-Hua Zhou -
2023 Poster: Optimistic Online Mirror Descent for Bridging Stochastic and Adversarial Online Convex Optimization »
SIJIA CHEN · Wei-Wei Tu · Peng Zhao · Lijun Zhang -
2023 Poster: Refined Regret for Adversarial MDPs with Linear Function Approximation »
Yan Dai · Haipeng Luo · Chen-Yu Wei · Julian Zimmert -
2023 Poster: Identifying Useful Learnwares for Heterogeneous Label Spaces »
Lan-Zhe Guo · Zhi Zhou · Yu-Feng Li · Zhi-Hua Zhou -
2022 : Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2022 Poster: Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games »
Gabriele Farina · Chung-Wei Lee · Haipeng Luo · Christian Kroer -
2022 Spotlight: Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games »
Gabriele Farina · Chung-Wei Lee · Haipeng Luo · Christian Kroer -
2022 Poster: Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP »
Liyu Chen · Rahul Jain · Haipeng Luo -
2022 Poster: Dynamic Regret of Online Markov Decision Processes »
Peng Zhao · Long-Fei Li · Zhi-Hua Zhou -
2022 Poster: Learning Infinite-horizon Average-reward Markov Decision Process with Constraints »
Liyu Chen · Rahul Jain · Haipeng Luo -
2022 Spotlight: Learning Infinite-horizon Average-reward Markov Decision Process with Constraints »
Liyu Chen · Rahul Jain · Haipeng Luo -
2022 Oral: Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP »
Liyu Chen · Rahul Jain · Haipeng Luo -
2022 Spotlight: Dynamic Regret of Online Markov Decision Processes »
Peng Zhao · Long-Fei Li · Zhi-Hua Zhou -
2021 : Implicit Finite-Horizon Approximation for Stochastic Shortest Path »
Liyu Chen · Mehdi Jafarnia · Rahul Jain · Haipeng Luo -
2021 Poster: Budgeted Heterogeneous Treatment Effect Estimation »
Tian Qin · Tian-Zuo Wang · Zhi-Hua Zhou -
2021 Spotlight: Budgeted Heterogeneous Treatment Effect Estimation »
Tian Qin · Tian-Zuo Wang · Zhi-Hua Zhou -
2021 Poster: Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang · Xiaojin Zhang -
2021 Poster: Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case »
Liyu Chen · Haipeng Luo -
2021 Spotlight: Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case »
Liyu Chen · Haipeng Luo -
2021 Spotlight: Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang · Xiaojin Zhang -
2020 Poster: Cost-effectively Identifying Causal Effects When Only Response Variable is Observable »
Tian-Zuo Wang · Xi-Zhu Wu · Sheng-Jun Huang · Zhi-Hua Zhou -
2020 Poster: Learning with Feature and Distribution Evolvable Streams »
Zhen-Yu Zhang · Peng Zhao · Yuan Jiang · Zhi-Hua Zhou -
2020 Poster: Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition »
Chi Jin · Tiancheng Jin · Haipeng Luo · Suvrit Sra · Tiancheng Yu -
2019 Poster: Adaptive Regret of Convex and Smooth Functions »
Lijun Zhang · Tie-Yan Liu · Zhi-Hua Zhou -
2019 Oral: Adaptive Regret of Convex and Smooth Functions »
Lijun Zhang · Tie-Yan Liu · Zhi-Hua Zhou -
2019 Poster: Heterogeneous Model Reuse via Optimizing Multiparty Multiclass Margin »
Xi-Zhu Wu · Song Liu · Zhi-Hua Zhou -
2019 Oral: Heterogeneous Model Reuse via Optimizing Multiparty Multiclass Margin »
Xi-Zhu Wu · Song Liu · Zhi-Hua Zhou -
2018 Poster: Rectify Heterogeneous Models with Semantic Mapping »
Han-Jia Ye · De-Chuan Zhan · Yuan Jiang · Zhi-Hua Zhou -
2018 Poster: Dynamic Regret of Strongly Adaptive Methods »
Lijun Zhang · Tianbao Yang · rong jin · Zhi-Hua Zhou -
2018 Oral: Rectify Heterogeneous Models with Semantic Mapping »
Han-Jia Ye · De-Chuan Zhan · Yuan Jiang · Zhi-Hua Zhou -
2018 Oral: Dynamic Regret of Strongly Adaptive Methods »
Lijun Zhang · Tianbao Yang · rong jin · Zhi-Hua Zhou -
2018 Poster: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Oral: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2017 Poster: A Unified View of Multi-Label Performance Measures »
Xi-Zhu Wu · Zhi-Hua Zhou -
2017 Talk: A Unified View of Multi-Label Performance Measures »
Xi-Zhu Wu · Zhi-Hua Zhou