Timezone: »
Poster
Actor-Critic based Improper Reinforcement Learning
Mohammadi Zaki · Avi Mohan · Aditya Gopalan · Shie Mannor
We consider an improper reinforcement learning setting where alearner is given $M$ base controllers for an unknown Markovdecision process, and wishes to combine them optimally to producea potentially new controller that can outperform each of the baseones. This can be useful in tuning across controllers, learnt possibly in mismatched or simulated environments, to obtain a good controller for a given target environment with relatively few trials. Towards this, we propose two algorithms: (1) a Policy Gradient-based approach; and (2) an algorithm that can switch between a simple Actor-Critic (AC) based scheme and a Natural Actor-Critic (NAC) scheme depending on the available information. Both algorithms operate over aclass of improper mixtures of the given controllers. For the first case, we derive convergence rate guarantees assuming access to a gradient oracle. For the AC-based approach we provide convergence rate guarantees to a stationary point in the basic AC case and to a global optimum in the NAC case. Numerical results on (i) the standard control theoretic benchmark of stabilizing an inverted pendulum; and (ii) a constrainedqueueing task show that our improper policy optimization algorithm can stabilize the system even when the base policies at its disposal are unstable.
Author Information
Mohammadi Zaki (Indian Institute of Science Bangalore)
Avi Mohan (Boston University)
Aditya Gopalan (Indian Institute of Science (IISc))
Shie Mannor (Technion)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Actor-Critic based Improper Reinforcement Learning »
Wed. Jul 20th 06:20 -- 06:25 PM Room Room 307
More from the Same Authors
-
2023 Poster: Representation-Driven Reinforcement Learning »
Ofir Nabati · Guy Tennenholtz · Shie Mannor -
2023 Poster: Learning to Initiate and Reason in Event-Driven Cascading Processes »
Yuval Atzmon · Eli Meirom · Shie Mannor · Gal Chechik -
2023 Poster: PPG Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient »
Kaixin Wang · Zhou Daquan · Jiashi Feng · Shie Mannor -
2023 Poster: Learning Hidden Markov Models When the Locations of Missing Observations are Unknown »
BINYAMIN PERETS · Mark Kozdoba · Shie Mannor -
2023 Poster: Reward-Mixing MDPs with Few Contexts are Learnable »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2022 Poster: Optimizing Tensor Network Contraction Using Reinforcement Learning »
Eli Meirom · Haggai Maron · Shie Mannor · Gal Chechik -
2022 Poster: The Geometry of Robust Value Functions »
Kaixin Wang · Navdeep Kumar · Kuangqi Zhou · Bryan Hooi · Jiashi Feng · Shie Mannor -
2022 Spotlight: The Geometry of Robust Value Functions »
Kaixin Wang · Navdeep Kumar · Kuangqi Zhou · Bryan Hooi · Jiashi Feng · Shie Mannor -
2022 Spotlight: Optimizing Tensor Network Contraction Using Reinforcement Learning »
Eli Meirom · Haggai Maron · Shie Mannor · Gal Chechik -
2022 Poster: Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2022 Spotlight: Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2021 : Invited Speaker: Shie Mannor: Lenient Regret »
Shie Mannor -
2020 Poster: From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model »
Aadirupa Saha · Aditya Gopalan -
2018 Poster: Beyond the One-Step Greedy Approach in Reinforcement Learning »
Yonathan Efroni · Gal Dalal · Bruno Scherrer · Shie Mannor -
2018 Oral: Beyond the One-Step Greedy Approach in Reinforcement Learning »
Yonathan Efroni · Gal Dalal · Bruno Scherrer · Shie Mannor -
2017 Workshop: Lifelong Learning: A Reinforcement Learning Approach »
Sarath Chandar · Balaraman Ravindran · Daniel J. Mankowitz · Shie Mannor · Tom Zahavy -
2017 Poster: Consistent On-Line Off-Policy Evaluation »
Assaf Hallak · Shie Mannor -
2017 Talk: Consistent On-Line Off-Policy Evaluation »
Assaf Hallak · Shie Mannor -
2017 Poster: End-to-End Differentiable Adversarial Imitation Learning »
Nir Baram · Oron Anschel · Itai Caspi · Shie Mannor -
2017 Poster: On Kernelized Multi-armed Bandits »
Sayak Ray Chowdhury · Aditya Gopalan -
2017 Talk: End-to-End Differentiable Adversarial Imitation Learning »
Nir Baram · Oron Anschel · Itai Caspi · Shie Mannor -
2017 Talk: On Kernelized Multi-armed Bandits »
Sayak Ray Chowdhury · Aditya Gopalan