Timezone: »
Poster
Reward-Free Exploration for Reinforcement Learning
Chi Jin · Akshay Krishnamurthy · Max Simchowitz · Tiancheng Yu
Thu Jul 16 06:00 AM -- 06:45 AM & Thu Jul 16 05:00 PM -- 05:45 PM (PDT) @
Exploration is widely regarded as one of the most challenging aspects of reinforcement learning (RL), with many naive approaches succumbing to exponential sample complexity. To isolate the challenges of exploration, we propose the following ``reward-free RL'' framework. In the exploration phase, the agent first collects trajectories from an MDP $M$ without a pre-specified reward function. After exploration, it is tasked with computing a near-policies under the transitions of $\mathcal{M}$ for a collection of given reward functions. This framework is particularly suitable where there are many reward functions of interest, or where the reward function is shaped by an external agent to elicit desired behavior.
We give an efficient algorithm that conducts $\widetilde{O}(S^2A\mathrm{poly}(H)/\epsilon^2)$ episodes of exploration, and returns $\epsilon$-suboptimal policies for an arbitrary number of reward functions. We achieve this by finding exploratory policies that jointly visit each ``significant'' state with probability proportional to its maximum visitation probability under any possible policy. Moreover, our planning procedure can be instantiated by any black-box approximate planner, such as value iteration or natural policy gradient. Finally, we give a nearly-matching $\Omega(S^2AH^2/\epsilon^2)$ lower bound, demonstrating the near-optimality of our algorithm in this setting.
Author Information
Chi Jin (Princeton University)
Akshay Krishnamurthy (Microsoft Research)
Max Simchowitz (UC Berkeley)
Tiancheng Yu (MIT)
I am quant researcher at Two Sigma. I completed my PhD in MIT EECS in 2023.
More from the Same Authors
-
2021 : Provable RL with Exogenous Distractors via Multistep Inverse Dynamics »
Yonathan Efroni · Dipendra Misra · Akshay Krishnamurthy · Alekh Agarwal · John Langford -
2021 : The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces »
Chi Jin · Qinghua Liu · Tiancheng Yu -
2021 : Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms »
Chi Jin · Qinghua Liu · Sobhan Miryoosefi -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2021 : Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games »
Yu Bai · Chi Jin · Huan Wang · Caiming Xiong -
2023 : Exposing Attention Glitches with Flip-Flop Language Modeling »
Bingbin Liu · Jordan Ash · Surbhi Goel · Akshay Krishnamurthy · Cyril Zhang -
2023 : Exposing Attention Glitches with Flip-Flop Language Modeling »
Bingbin Liu · Jordan Ash · Surbhi Goel · Akshay Krishnamurthy · Cyril Zhang -
2023 : Is RLHF More Difficult than Standard RL? »
Chi Jin -
2023 Poster: Efficient displacement convex optimization with particle gradient descent »
Hadi Daneshmand · Jason Lee · Chi Jin -
2023 Poster: Streaming Active Learning with Deep Neural Networks »
Akanksha Saran · Safoora Yousefi · Akshay Krishnamurthy · John Langford · Jordan Ash -
2023 Poster: Statistical Learning under Heterogenous Distribution Shift »
Max Simchowitz · Anurag Ajay · Pulkit Agrawal · Akshay Krishnamurthy -
2022 Poster: A Simple Reward-free Approach to Constrained Reinforcement Learning »
Sobhan Miryoosefi · Chi Jin -
2022 Poster: Universal and data-adaptive algorithms for model selection in linear contextual bandits »
Vidya Muthukumar · Akshay Krishnamurthy -
2022 Spotlight: Universal and data-adaptive algorithms for model selection in linear contextual bandits »
Vidya Muthukumar · Akshay Krishnamurthy -
2022 Spotlight: A Simple Reward-free Approach to Constrained Reinforcement Learning »
Sobhan Miryoosefi · Chi Jin -
2022 Poster: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Poster: The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces »
Chi Jin · Qinghua Liu · Tiancheng Yu -
2022 Poster: Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits »
Qinghua Liu · Yuanhao Wang · Chi Jin -
2022 Poster: Near-Optimal Learning of Extensive-Form Games with Imperfect Information »
Yu Bai · Chi Jin · Song Mei · Tiancheng Yu -
2022 Poster: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Spotlight: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Spotlight: Near-Optimal Learning of Extensive-Form Games with Imperfect Information »
Yu Bai · Chi Jin · Song Mei · Tiancheng Yu -
2022 Spotlight: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Oral: Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits »
Qinghua Liu · Yuanhao Wang · Chi Jin -
2022 Spotlight: The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces »
Chi Jin · Qinghua Liu · Tiancheng Yu -
2022 Poster: Provable Reinforcement Learning with a Short-Term Memory »
Yonathan Efroni · Chi Jin · Akshay Krishnamurthy · Sobhan Miryoosefi -
2022 Spotlight: Provable Reinforcement Learning with a Short-Term Memory »
Yonathan Efroni · Chi Jin · Akshay Krishnamurthy · Sobhan Miryoosefi -
2021 : Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games »
Yu Bai · Chi Jin · Huan Wang · Caiming Xiong -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2021 Poster: Near-Optimal Representation Learning for Linear Bandits and Linear RL »
Jiachen Hu · Xiaoyu Chen · Chi Jin · Lihong Li · Liwei Wang -
2021 Poster: Task-Optimal Exploration in Linear Dynamical Systems »
Andrew Wagenmaker · Max Simchowitz · Kevin Jamieson -
2021 Poster: A Sharp Analysis of Model-based Reinforcement Learning with Self-Play »
Qinghua Liu · Tiancheng Yu · Yu Bai · Chi Jin -
2021 Poster: Provable Meta-Learning of Linear Representations »
Nilesh Tripuraneni · Chi Jin · Michael Jordan -
2021 Poster: Provably Efficient Algorithms for Multi-Objective Competitive RL »
Tiancheng Yu · Yi Tian · Jingzhao Zhang · Suvrit Sra -
2021 Poster: Online Learning in Unknown Markov Games »
Yi Tian · Yuanhao Wang · Tiancheng Yu · Suvrit Sra -
2021 Spotlight: Provable Meta-Learning of Linear Representations »
Nilesh Tripuraneni · Chi Jin · Michael Jordan -
2021 Spotlight: A Sharp Analysis of Model-based Reinforcement Learning with Self-Play »
Qinghua Liu · Tiancheng Yu · Yu Bai · Chi Jin -
2021 Spotlight: Online Learning in Unknown Markov Games »
Yi Tian · Yuanhao Wang · Tiancheng Yu · Suvrit Sra -
2021 Oral: Task-Optimal Exploration in Linear Dynamical Systems »
Andrew Wagenmaker · Max Simchowitz · Kevin Jamieson -
2021 Oral: Provably Efficient Algorithms for Multi-Objective Competitive RL »
Tiancheng Yu · Yi Tian · Jingzhao Zhang · Suvrit Sra -
2021 Spotlight: Near-Optimal Representation Learning for Linear Bandits and Linear RL »
Jiachen Hu · Xiaoyu Chen · Chi Jin · Lihong Li · Liwei Wang -
2021 Poster: Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning »
Yaqi Duan · Chi Jin · Zhiyuan Li -
2021 Spotlight: Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning »
Yaqi Duan · Chi Jin · Zhiyuan Li -
2020 : Representation learning and exploration in reinforcement learning - Akshay Krishnamurthy »
Akshay Krishnamurthy -
2020 : Short Talk 5 - Near-Optimal Reinforcement Learning with Self-Play »
Tiancheng Yu -
2020 : Speaker Panel »
Csaba Szepesvari · Martha White · Sham Kakade · Gergely Neu · Shipra Agrawal · Akshay Krishnamurthy -
2020 Workshop: Theoretical Foundations of Reinforcement Learning »
Emma Brunskill · Thodoris Lykouris · Max Simchowitz · Wen Sun · Mengdi Wang -
2020 Poster: Naive Exploration is Optimal for Online LQR »
Max Simchowitz · Dylan Foster -
2020 Poster: Doubly robust off-policy evaluation with shrinkage »
Yi Su · Maria Dimakopoulou · Akshay Krishnamurthy · Miroslav Dudik -
2020 Poster: Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning »
Dipendra Kumar Misra · Mikael Henaff · Akshay Krishnamurthy · John Langford -
2020 Poster: On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems »
Tianyi Lin · Chi Jin · Michael Jordan -
2020 Poster: Provable Self-Play Algorithms for Competitive Reinforcement Learning »
Yu Bai · Chi Jin -
2020 Poster: Balancing Competing Objectives with Noisy Data: Score-Based Classifiers for Welfare-Aware Machine Learning »
Esther Rolf · Max Simchowitz · Sarah Dean · Lydia T. Liu · Daniel Bjorkegren · Moritz Hardt · Joshua Blumenstock -
2020 Poster: Adaptive Estimator Selection for Off-Policy Evaluation »
Yi Su · Pavithra Srinath · Akshay Krishnamurthy -
2020 Poster: Logarithmic Regret for Adversarial Online Control »
Dylan Foster · Max Simchowitz -
2020 Poster: Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition »
Chi Jin · Tiancheng Jin · Haipeng Luo · Suvrit Sra · Tiancheng Yu -
2020 Poster: Provably Efficient Exploration in Policy Optimization »
Qi Cai · Zhuoran Yang · Chi Jin · Zhaoran Wang -
2020 Poster: Private Reinforcement Learning with PAC and Regret Guarantees »
Giuseppe Vietri · Borja de Balle Pigem · Akshay Krishnamurthy · Steven Wu -
2020 Poster: What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? »
Chi Jin · Praneeth Netrapalli · Michael Jordan -
2019 Poster: Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments »
Kirthevasan Kandasamy · Willie Neiswanger · Reed Zhang · Akshay Krishnamurthy · Jeff Schneider · Barnabás Póczos -
2019 Oral: Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments »
Kirthevasan Kandasamy · Willie Neiswanger · Reed Zhang · Akshay Krishnamurthy · Jeff Schneider · Barnabás Póczos -
2019 Poster: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Poster: The Implicit Fairness Criterion of Unconstrained Learning »
Lydia T. Liu · Max Simchowitz · Moritz Hardt -
2019 Oral: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Oral: The Implicit Fairness Criterion of Unconstrained Learning »
Lydia T. Liu · Max Simchowitz · Moritz Hardt -
2018 Poster: Semiparametric Contextual Bandits »
Akshay Krishnamurthy · Steven Wu · Vasilis Syrgkanis -
2018 Oral: Semiparametric Contextual Bandits »
Akshay Krishnamurthy · Steven Wu · Vasilis Syrgkanis -
2018 Poster: Delayed Impact of Fair Machine Learning »
Lydia T. Liu · Sarah Dean · Esther Rolf · Max Simchowitz · Moritz Hardt -
2018 Oral: Delayed Impact of Fair Machine Learning »
Lydia T. Liu · Sarah Dean · Esther Rolf · Max Simchowitz · Moritz Hardt -
2017 Poster: Contextual Decision Processes with low Bellman rank are PAC-Learnable »
Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 Talk: Contextual Decision Processes with low Bellman rank are PAC-Learnable »
Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 Poster: Active Learning for Cost-Sensitive Classification »
Akshay Krishnamurthy · Alekh Agarwal · Tzu-Kuo Huang · Hal Daumé III · John Langford -
2017 Talk: Active Learning for Cost-Sensitive Classification »
Akshay Krishnamurthy · Alekh Agarwal · Tzu-Kuo Huang · Hal Daumé III · John Langford