Timezone: »
Poster
Reward-Mixing MDPs with Few Latent Contexts are Learnable
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor
We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs): at the beginning of every episode nature randomly picks a latent reward model among $M$ candidates and an agent interacts with the MDP throughout the episode for $H$ time steps. Our goal is to learn a near-optimal policy that nearly maximizes the $H$ time-step cumulative rewards in such a model. Prior work established an upper bound for RMMDPs with $M=2$. In this work, we resolve several open questions for the general RMMDP setting. We consider an arbitrary $M\ge2$ and provide a sample-efficient algorithm--$EM^2$--that outputs an $\epsilon$-optimal policy using $O \left(\epsilon^{-2} \cdot S^d A^d \cdot \text{poly}(H, Z)^d \right)$ episodes, where $S, A$ are the number of states and actions respectively, $H$ is the time-horizon, $Z$ is the support size of reward distributions and $d=O(\min(M,H))$. We also provide a $(SA)^{\Omega(\sqrt{M})} / \epsilon^{2}$ lower bound, supporting that super-polynomial sample complexity in $M$ is necessary.
Author Information
Jeongyeol Kwon (University of Wisconsin-Madison)
I am currently a PostDoc at University of Wisconsin-Madison, working with Prof. Robert Nowak. Prior to joining UW-Madison, I received my Ph.D. in ECE department at UT Austin, where I had wonderful years of learning and working with my advisor Prof. Constantine Caramanis. I got my Bachelor’s Degree in Electrical and Computer Engineering from Seoul National University (SNU) in 2016.
Yonathan Efroni (Meta)
Constantine Caramanis (University of Texas)
Shie Mannor (Technion)
More from the Same Authors
-
2021 : Minimax Regret for Stochastic Shortest Path »
Alon Cohen · Yonathan Efroni · Yishay Mansour · Aviv Rosenberg -
2021 : Provable RL with Exogenous Distractors via Multistep Inverse Dynamics »
Yonathan Efroni · Dipendra Misra · Akshay Krishnamurthy · Alekh Agarwal · John Langford -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2023 : Optimization or Architecture: What Matters in Non-Linear Filtering? »
Ido Greenberg · Netanel Yannay · Shie Mannor -
2023 : Optimization or Architecture: What Matters in Non-Linear Filtering? »
Ido Greenberg · Netanel Yannay · Shie Mannor -
2023 : Optimization or Architecture: What Matters in Non-Linear Filtering? »
Ido Greenberg · Netanel Yannay · Shie Mannor -
2023 Oral: A Fully First-Order Method for Stochastic Bilevel Optimization »
Jeongyeol Kwon · Dohyun Kwon · Stephen Wright · Robert Nowak -
2023 Poster: Learning to Initiate and Reason in Event-Driven Cascading Processes »
Yuval Atzmon · Eli Meirom · Shie Mannor · Gal Chechik -
2023 Poster: Learning Hidden Markov Models When the Locations of Missing Observations are Unknown »
BINYAMIN PERETS · Mark Kozdoba · Shie Mannor -
2023 Poster: PPG Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient »
Kaixin Wang · Zhou Daquan · Jiashi Feng · Shie Mannor -
2023 Poster: A Fully First-Order Method for Stochastic Bilevel Optimization »
Jeongyeol Kwon · Dohyun Kwon · Stephen Wright · Robert Nowak -
2023 Poster: Representation-Driven Reinforcement Learning »
Ofir Nabati · Guy Tennenholtz · Shie Mannor -
2023 Poster: Principled Offline RL in the Presence of Rich Exogenous Information »
Riashat Islam · Manan Tomar · Alex Lamb · Yonathan Efroni · Hongyu Zang · Aniket Didolkar · Dipendra Misra · Xin Li · Harm Seijen · Remi Tachet des Combes · John Langford -
2023 Poster: Feed Two Birds with One Scone: Exploiting Wild Data for Both Out-of-Distribution Generalization and Detection »
Haoyue Bai · Gregory Canal · Xuefeng Du · Jeongyeol Kwon · Robert Nowak · Sharon Li -
2022 Poster: Asymptotically-Optimal Gaussian Bandits with Side Observations »
Alexia Atsidakou · Orestis Papadigenopoulos · Constantine Caramanis · Sujay Sanghavi · Sanjay Shakkottai -
2022 Spotlight: Asymptotically-Optimal Gaussian Bandits with Side Observations »
Alexia Atsidakou · Orestis Papadigenopoulos · Constantine Caramanis · Sujay Sanghavi · Sanjay Shakkottai -
2022 Poster: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Poster: Actor-Critic based Improper Reinforcement Learning »
Mohammadi Zaki · Avi Mohan · Aditya Gopalan · Shie Mannor -
2022 Poster: Optimizing Tensor Network Contraction Using Reinforcement Learning »
Eli Meirom · Haggai Maron · Shie Mannor · Gal Chechik -
2022 Poster: The Geometry of Robust Value Functions »
Kaixin Wang · Navdeep Kumar · Kuangqi Zhou · Bryan Hooi · Jiashi Feng · Shie Mannor -
2022 Spotlight: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Spotlight: The Geometry of Robust Value Functions »
Kaixin Wang · Navdeep Kumar · Kuangqi Zhou · Bryan Hooi · Jiashi Feng · Shie Mannor -
2022 Spotlight: Actor-Critic based Improper Reinforcement Learning »
Mohammadi Zaki · Avi Mohan · Aditya Gopalan · Shie Mannor -
2022 Spotlight: Optimizing Tensor Network Contraction Using Reinforcement Learning »
Eli Meirom · Haggai Maron · Shie Mannor · Gal Chechik -
2022 Poster: Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2022 Poster: Provable Reinforcement Learning with a Short-Term Memory »
Yonathan Efroni · Chi Jin · Akshay Krishnamurthy · Sobhan Miryoosefi -
2022 Spotlight: Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2022 Spotlight: Provable Reinforcement Learning with a Short-Term Memory »
Yonathan Efroni · Chi Jin · Akshay Krishnamurthy · Sobhan Miryoosefi -
2021 : Invited Speaker: Shie Mannor: Lenient Regret »
Shie Mannor -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2021 Poster: Confidence-Budget Matching for Sequential Budgeted Learning »
Yonathan Efroni · Nadav Merlis · Aadirupa Saha · Shie Mannor -
2021 Poster: Combinatorial Blocking Bandits with Stochastic Delays »
Alexia Atsidakou · Orestis Papadigenopoulos · Soumya Basu · Constantine Caramanis · Sanjay Shakkottai -
2021 Spotlight: Combinatorial Blocking Bandits with Stochastic Delays »
Alexia Atsidakou · Orestis Papadigenopoulos · Soumya Basu · Constantine Caramanis · Sanjay Shakkottai -
2021 Spotlight: Confidence-Budget Matching for Sequential Budgeted Learning »
Yonathan Efroni · Nadav Merlis · Aadirupa Saha · Shie Mannor -
2020 Poster: Optimistic Policy Optimization with Bandit Feedback »
Lior Shani · Yonathan Efroni · Aviv Rosenberg · Shie Mannor -
2020 Poster: Learning Mixtures of Graphs from Epidemic Cascades »
Jessica Hoffmann · Soumya Basu · Surbhi Goel · Constantine Caramanis -
2020 Poster: Multi-step Greedy Reinforcement Learning Algorithms »
Manan Tomar · Yonathan Efroni · Mohammad Ghavamzadeh -
2019 Poster: Robust Estimation of Tree Structured Gaussian Graphical Models »
Ashish Katiyar · Jessica Hoffmann · Constantine Caramanis -
2019 Oral: Robust Estimation of Tree Structured Gaussian Graphical Models »
Ashish Katiyar · Jessica Hoffmann · Constantine Caramanis -
2019 Poster: Exploration Conscious Reinforcement Learning Revisited »
Lior Shani · Yonathan Efroni · Shie Mannor -
2019 Poster: Action Robust Reinforcement Learning and Applications in Continuous Control »
Chen Tessler · Chen Tessler · Yonathan Efroni · Shie Mannor -
2019 Oral: Exploration Conscious Reinforcement Learning Revisited »
Lior Shani · Yonathan Efroni · Shie Mannor -
2019 Oral: Action Robust Reinforcement Learning and Applications in Continuous Control »
Chen Tessler · Chen Tessler · Yonathan Efroni · Yonathan Efroni · Shie Mannor · Shie Mannor -
2018 Poster: Beyond the One-Step Greedy Approach in Reinforcement Learning »
Yonathan Efroni · Gal Dalal · Bruno Scherrer · Shie Mannor -
2018 Oral: Beyond the One-Step Greedy Approach in Reinforcement Learning »
Yonathan Efroni · Gal Dalal · Bruno Scherrer · Shie Mannor -
2017 Workshop: Lifelong Learning: A Reinforcement Learning Approach »
Sarath Chandar · Balaraman Ravindran · Daniel J. Mankowitz · Shie Mannor · Tom Zahavy -
2017 Poster: Consistent On-Line Off-Policy Evaluation »
Assaf Hallak · Shie Mannor -
2017 Talk: Consistent On-Line Off-Policy Evaluation »
Assaf Hallak · Shie Mannor -
2017 Poster: End-to-End Differentiable Adversarial Imitation Learning »
Nir Baram · Oron Anschel · Itai Caspi · Shie Mannor -
2017 Talk: End-to-End Differentiable Adversarial Imitation Learning »
Nir Baram · Oron Anschel · Itai Caspi · Shie Mannor