Timezone: »
Poster
Optimistic Policy Optimization with Bandit Feedback
Lior Shani · Yonathan Efroni · Aviv Rosenberg · Shie Mannor
Thu Jul 16 02:00 PM -- 02:45 PM & Fri Jul 17 02:00 AM -- 02:45 AM (PDT) @
Policy optimization methods are one of the most widely used classes of Reinforcement Learning (RL) algorithms. Yet, so far, such methods have been mostly analyzed from an optimization perspective, without addressing the problem of exploration, or by making strong assumptions on the interaction with the environment.
In this paper we consider model-based RL in the tabular finite-horizon MDP setting with unknown transitions and bandit feedback. For this setting, we propose an optimistic trust region policy optimization (TRPO) algorithm for which we establish $\tilde O(\sqrt{S^2 A H^4 K})$ regret for stochastic rewards. Furthermore, we prove $\tilde O( \sqrt{ S^2 A H^4 } K^{2/3} ) $ regret for adversarial rewards. Interestingly, this result matches previous bounds derived for the bandit feedback case, yet with known transitions. To the best of our knowledge, the two results are the first sub-linear regret bounds obtained for policy optimization algorithms with unknown transitions and bandit feedback.
Author Information
Lior Shani (Technion)
Yonathan Efroni (Technion)
Aviv Rosenberg (Tel Aviv University)
Shie Mannor (Technion)
More from the Same Authors
-
2021 : Minimax Regret for Stochastic Shortest Path »
Alon Cohen · Yonathan Efroni · Yishay Mansour · Aviv Rosenberg -
2021 : Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure »
Aviv Rosenberg · Yishay Mansour -
2021 : Learning Adversarial Markov Decision Processes with Delayed Feedback »
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour -
2021 : Provable RL with Exogenous Distractors via Multistep Inverse Dynamics »
Yonathan Efroni · Dipendra Misra · Akshay Krishnamurthy · Alekh Agarwal · John Langford -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 : Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2023 Poster: Representation-Driven Reinforcement Learning »
Ofir Nabati · Guy Tennenholtz · Shie Mannor -
2023 Poster: Principled Offline RL in the Presence of Rich Exogenous Information »
Riashat Islam · Manan Tomar · Alex Lamb · Yonathan Efroni · Hongyu Zang · Aniket Didolkar · Dipendra Misra · Xin Li · Harm Seijen · Remi Tachet des Combes · John Langford -
2023 Poster: Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback »
Tal Lancewicki · Aviv Rosenberg · Dmitry Sotnikov -
2023 Poster: Learning to Initiate and Reason in Event-Driven Cascading Processes »
Yuval Atzmon · Eli Meirom · Shie Mannor · Gal Chechik -
2023 Poster: Reinforcement Learning with History Dependent Dynamic Contexts »
Guy Tennenholtz · Nadav Merlis · Lior Shani · Martin Mladenov · Craig Boutilier -
2023 Poster: PPG Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient »
Kaixin Wang · Zhou Daquan · Jiashi Feng · Shie Mannor -
2023 Poster: Learning Hidden Markov Models When the Locations of Missing Observations are Unknown »
BINYAMIN PERETS · Mark Kozdoba · Shie Mannor -
2023 Poster: Reward-Mixing MDPs with Few Contexts are Learnable »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2022 : Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2022 Poster: Analysis of Stochastic Processes through Replay Buffers »
Shirli Di-Castro Shashua · Shie Mannor · Dotan Di Castro -
2022 Poster: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Poster: Actor-Critic based Improper Reinforcement Learning »
Mohammadi Zaki · Avi Mohan · Aditya Gopalan · Shie Mannor -
2022 Poster: Optimizing Tensor Network Contraction Using Reinforcement Learning »
Eli Meirom · Haggai Maron · Shie Mannor · Gal Chechik -
2022 Poster: The Geometry of Robust Value Functions »
Kaixin Wang · Navdeep Kumar · Kuangqi Zhou · Bryan Hooi · Jiashi Feng · Shie Mannor -
2022 Spotlight: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Spotlight: The Geometry of Robust Value Functions »
Kaixin Wang · Navdeep Kumar · Kuangqi Zhou · Bryan Hooi · Jiashi Feng · Shie Mannor -
2022 Spotlight: Actor-Critic based Improper Reinforcement Learning »
Mohammadi Zaki · Avi Mohan · Aditya Gopalan · Shie Mannor -
2022 Spotlight: Analysis of Stochastic Processes through Replay Buffers »
Shirli Di-Castro Shashua · Shie Mannor · Dotan Di Castro -
2022 Spotlight: Optimizing Tensor Network Contraction Using Reinforcement Learning »
Eli Meirom · Haggai Maron · Shie Mannor · Gal Chechik -
2022 Poster: Cooperative Online Learning in Stochastic and Adversarial MDPs »
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour -
2022 Poster: Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2022 Poster: Provable Reinforcement Learning with a Short-Term Memory »
Yonathan Efroni · Chi Jin · Akshay Krishnamurthy · Sobhan Miryoosefi -
2022 Oral: Cooperative Online Learning in Stochastic and Adversarial MDPs »
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour -
2022 Spotlight: Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2022 Spotlight: Provable Reinforcement Learning with a Short-Term Memory »
Yonathan Efroni · Chi Jin · Akshay Krishnamurthy · Sobhan Miryoosefi -
2021 : Invited Speaker: Shie Mannor: Lenient Regret »
Shie Mannor -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2021 : RL + Operations Research Panel »
Jim Dai · Fei Fang · Shie Mannor · Yuandong Tian · Zhiwei (Tony) Qin · Zongqing Lu -
2021 Poster: Detecting Rewards Deterioration in Episodic Reinforcement Learning »
Ido Greenberg · Shie Mannor -
2021 Poster: Online Limited Memory Neural-Linear Bandits with Likelihood Matching »
Ofir Nabati · Tom Zahavy · Shie Mannor -
2021 Spotlight: Online Limited Memory Neural-Linear Bandits with Likelihood Matching »
Ofir Nabati · Tom Zahavy · Shie Mannor -
2021 Spotlight: Detecting Rewards Deterioration in Episodic Reinforcement Learning »
Ido Greenberg · Shie Mannor -
2021 Poster: Confidence-Budget Matching for Sequential Budgeted Learning »
Yonathan Efroni · Nadav Merlis · Aadirupa Saha · Shie Mannor -
2021 Spotlight: Confidence-Budget Matching for Sequential Budgeted Learning »
Yonathan Efroni · Nadav Merlis · Aadirupa Saha · Shie Mannor -
2021 Poster: Value Iteration in Continuous Actions, States and Time »
Michael Lutter · Shie Mannor · Jan Peters · Dieter Fox · Animesh Garg -
2021 Spotlight: Value Iteration in Continuous Actions, States and Time »
Michael Lutter · Shie Mannor · Jan Peters · Dieter Fox · Animesh Garg -
2021 Poster: Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks »
Eli Meirom · Haggai Maron · Shie Mannor · Gal Chechik -
2021 Spotlight: Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks »
Eli Meirom · Haggai Maron · Shie Mannor · Gal Chechik -
2020 Poster: Topic Modeling via Full Dependence Mixtures »
Dan Fisher · Mark Kozdoba · Shie Mannor -
2020 Poster: Near-optimal Regret Bounds for Stochastic Shortest Path »
Aviv Rosenberg · Alon Cohen · Yishay Mansour · Haim Kaplan -
2020 Poster: Multi-step Greedy Reinforcement Learning Algorithms »
Manan Tomar · Yonathan Efroni · Mohammad Ghavamzadeh -
2019 Poster: Online Convex Optimization in Adversarial Markov Decision Processes »
Aviv Rosenberg · Yishay Mansour -
2019 Oral: Online Convex Optimization in Adversarial Markov Decision Processes »
Aviv Rosenberg · Yishay Mansour -
2019 Poster: Exploration Conscious Reinforcement Learning Revisited »
Lior Shani · Yonathan Efroni · Shie Mannor -
2019 Poster: Action Robust Reinforcement Learning and Applications in Continuous Control »
Chen Tessler · Chen Tessler · Yonathan Efroni · Shie Mannor -
2019 Poster: The Natural Language of Actions »
Guy Tennenholtz · Shie Mannor -
2019 Oral: Exploration Conscious Reinforcement Learning Revisited »
Lior Shani · Yonathan Efroni · Shie Mannor -
2019 Oral: The Natural Language of Actions »
Guy Tennenholtz · Shie Mannor -
2019 Poster: Nonlinear Distributional Gradient Temporal-Difference Learning »
chao qu · Shie Mannor · Huan Xu -
2019 Oral: Action Robust Reinforcement Learning and Applications in Continuous Control »
Chen Tessler · Chen Tessler · Yonathan Efroni · Yonathan Efroni · Shie Mannor · Shie Mannor -
2019 Oral: Nonlinear Distributional Gradient Temporal-Difference Learning »
chao qu · Shie Mannor · Huan Xu -
2018 Poster: Beyond the One-Step Greedy Approach in Reinforcement Learning »
Yonathan Efroni · Gal Dalal · Bruno Scherrer · Shie Mannor -
2018 Oral: Beyond the One-Step Greedy Approach in Reinforcement Learning »
Yonathan Efroni · Gal Dalal · Bruno Scherrer · Shie Mannor -
2017 Workshop: Lifelong Learning: A Reinforcement Learning Approach »
Sarath Chandar · Balaraman Ravindran · Daniel J. Mankowitz · Shie Mannor · Tom Zahavy -
2017 Poster: Consistent On-Line Off-Policy Evaluation »
Assaf Hallak · Shie Mannor -
2017 Talk: Consistent On-Line Off-Policy Evaluation »
Assaf Hallak · Shie Mannor -
2017 Poster: End-to-End Differentiable Adversarial Imitation Learning »
Nir Baram · Oron Anschel · Itai Caspi · Shie Mannor -
2017 Poster: Multi-objective Bandits: Optimizing the Generalized Gini Index »
Robert Busa-Fekete · Balazs Szorenyi · Paul Weng · Shie Mannor -
2017 Talk: End-to-End Differentiable Adversarial Imitation Learning »
Nir Baram · Oron Anschel · Itai Caspi · Shie Mannor -
2017 Talk: Multi-objective Bandits: Optimizing the Generalized Gini Index »
Robert Busa-Fekete · Balazs Szorenyi · Paul Weng · Shie Mannor