Timezone: »
Oral
Best of Both Worlds Policy Optimization
Christoph Dann · Chen-Yu Wei · Julian Zimmert
Policy optimization methods are popular reinforcement learning algorithms in practice and recent works have build theoretical foundation for them by proving $\sqrt{T}$ regret bounds even when the losses are adversarial. Such bounds are tight in the worst case but often overly pessimistic. In this work, we show that by carefully designing the regularizer, bonus terms, and learning rates, one can achieve a more favorable $\text{polylog}(T)$ regret bound when the losses are stochastic, without sacrificing the worst-case guarantee in the adversarial regime. Specifically, we show the first best of both worlds guarantee for policy optimization in tabular MDPs by leveraging either a Tsallis entropy or a Shannon entropy regularizer. Then we show that under known transitions, we can further obtain a first-order regret bound in the adversarial regime by leveraging the log barrier regularizer.
Author Information
Christoph Dann (Google)
Chen-Yu Wei (University of Virginia)
Julian Zimmert (Google Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2023 Poster: Best of Both Worlds Policy Optimization »
Wed. Jul 26th 09:00 -- 10:30 PM Room Exhibit Hall 1 #622
More from the Same Authors
-
2021 : Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses »
Haipeng Luo · Chen-Yu Wei · Chung-Wei Lee -
2023 : Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games »
Yang Cai · Haipeng Luo · Chen-Yu Wei · Weiqiang Zheng -
2023 Poster: Learning in POMDPs is Sample-Efficient with Hindsight Observability »
Jonathan Lee · Alekh Agarwal · Christoph Dann · Tong Zhang -
2023 Poster: Reinforcement Learning Can Be More Efficient with Multiple Rewards »
Christoph Dann · Yishay Mansour · Mehryar Mohri -
2023 Poster: Refined Regret for Adversarial MDPs with Linear Function Approximation »
Yan Dai · Haipeng Luo · Chen-Yu Wei · Julian Zimmert -
2022 Poster: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2022 Spotlight: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2022 Poster: Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence »
Dongsheng Ding · Chen-Yu Wei · Kaiqing Zhang · Mihailo Jovanovic -
2022 Oral: Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence »
Dongsheng Ding · Chen-Yu Wei · Kaiqing Zhang · Mihailo Jovanovic -
2021 Poster: Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang · Xiaojin Zhang -
2021 Poster: Dynamic Balancing for Model Selection in Bandits and RL »
Ashok Cutkosky · Christoph Dann · Abhimanyu Das · Claudio Gentile · Aldo Pacchiano · Manish Purohit -
2021 Spotlight: Dynamic Balancing for Model Selection in Bandits and RL »
Ashok Cutkosky · Christoph Dann · Abhimanyu Das · Claudio Gentile · Aldo Pacchiano · Manish Purohit -
2021 Spotlight: Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang · Xiaojin Zhang -
2020 Poster: Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes »
Chen-Yu Wei · Mehdi Jafarnia · Haipeng Luo · Hiteshi Sharma · Rahul Jain -
2020 Poster: Online Learning for Active Cache Synchronization »
Andrey Kolobov · Sebastien Bubeck · Julian Zimmert -
2019 Poster: Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case »
Alina Beygelzimer · David Pal · Balazs Szorenyi · Devanathan Thiruvenkatachari · Chen-Yu Wei · Chicheng Zhang -
2019 Oral: Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case »
Alina Beygelzimer · David Pal · Balazs Szorenyi · Devanathan Thiruvenkatachari · Chen-Yu Wei · Chicheng Zhang -
2019 Poster: Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously »
Julian Zimmert · Haipeng Luo · Chen-Yu Wei -
2019 Oral: Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously »
Julian Zimmert · Haipeng Luo · Chen-Yu Wei -
2019 Poster: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Oral: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2018 Poster: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill -
2018 Oral: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill