Timezone: »
Exploration in unknown environments is a fundamental problem in reinforcement learning and control. In this work, we study task-guided exploration and determine what precisely an agent must learn about their environment in order to complete a particular task. Formally, we study a broad class of decision-making problems in the setting of linear dynamical systems, a class that includes the linear quadratic regulator problem. We provide instance- and task-dependent lower bounds which explicitly quantify the difficulty of completing a task of interest. Motivated by our lower bound, we propose a computationally efficient experiment-design based exploration algorithm. We show that it optimally explores the environment, collecting precisely the information needed to complete the task, and provide finite-time bounds guaranteeing that it achieves the instance- and task-optimal sample complexity, up to constant factors. Through several examples of the linear quadratic regulator problem, we show that performing task-guided exploration provably improves on exploration schemes which do not take into account the task of interest. Along the way, we establish that certainty equivalence decision making is instance- and task-optimal, and obtain the first algorithm for the linear quadratic regulator problem which is instance-optimal. We conclude with several experiments illustrating the effectiveness of our approach in practice.
Author Information
Andrew Wagenmaker (University of Washington)
Max Simchowitz (UC Berkeley)
Kevin Jamieson (University of Washington)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Task-Optimal Exploration in Linear Dynamical Systems »
Thu. Jul 22nd 04:00 -- 06:00 AM Room
More from the Same Authors
-
2023 : LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning »
Jifan Zhang · Yifang Chen · Gregory Canal · Stephen Mussmann · Yinglun Zhu · Simon Du · Kevin Jamieson · Robert Nowak -
2023 Poster: Leveraging Offline Data in Online Reinforcement Learning »
Andrew Wagenmaker · Aldo Pacchiano -
2023 Poster: Improved Active Multi-Task Representation Learning via Lasso »
Yiping Wang · Yifang Chen · Kevin Jamieson · Simon Du -
2022 Poster: First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Poster: Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Spotlight: Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Oral: First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Poster: Active Multi-Task Representation Learning »
Yifang Chen · Kevin Jamieson · Simon Du -
2022 Spotlight: Active Multi-Task Representation Learning »
Yifang Chen · Kevin Jamieson · Simon Du -
2021 Poster: Improved Algorithms for Agnostic Pool-based Active Classification »
Julian Katz-Samuels · Jifan Zhang · Lalit Jain · Kevin Jamieson -
2021 Spotlight: Improved Algorithms for Agnostic Pool-based Active Classification »
Julian Katz-Samuels · Jifan Zhang · Lalit Jain · Kevin Jamieson -
2021 Poster: Improved Corruption Robust Algorithms for Episodic Reinforcement Learning »
Yifang Chen · Simon Du · Kevin Jamieson -
2021 Spotlight: Improved Corruption Robust Algorithms for Episodic Reinforcement Learning »
Yifang Chen · Simon Du · Kevin Jamieson -
2021 Poster: High-dimensional Experimental Design and Kernel Bandits »
Romain Camilleri · Kevin Jamieson · Julian Katz-Samuels -
2021 Oral: High-dimensional Experimental Design and Kernel Bandits »
Romain Camilleri · Kevin Jamieson · Julian Katz-Samuels -
2020 Workshop: Theoretical Foundations of Reinforcement Learning »
Emma Brunskill · Thodoris Lykouris · Max Simchowitz · Wen Sun · Mengdi Wang -
2020 Poster: Naive Exploration is Optimal for Online LQR »
Max Simchowitz · Dylan Foster -
2020 Poster: Reward-Free Exploration for Reinforcement Learning »
Chi Jin · Akshay Krishnamurthy · Max Simchowitz · Tiancheng Yu -
2020 Poster: Balancing Competing Objectives with Noisy Data: Score-Based Classifiers for Welfare-Aware Machine Learning »
Esther Rolf · Max Simchowitz · Sarah Dean · Lydia T. Liu · Daniel Bjorkegren · Moritz Hardt · Joshua Blumenstock -
2020 Poster: Logarithmic Regret for Adversarial Online Control »
Dylan Foster · Max Simchowitz -
2020 Poster: Estimating the Number and Effect Sizes of Non-null Hypotheses »
Jennifer Brennan · Ramya Korlakai Vinayak · Kevin Jamieson -
2019 Poster: The Implicit Fairness Criterion of Unconstrained Learning »
Lydia T. Liu · Max Simchowitz · Moritz Hardt -
2019 Oral: The Implicit Fairness Criterion of Unconstrained Learning »
Lydia T. Liu · Max Simchowitz · Moritz Hardt -
2018 Poster: Firing Bandits: Optimizing Crowdfunding »
Lalit Jain · Kevin Jamieson -
2018 Oral: Firing Bandits: Optimizing Crowdfunding »
Lalit Jain · Kevin Jamieson -
2018 Poster: Delayed Impact of Fair Machine Learning »
Lydia T. Liu · Sarah Dean · Esther Rolf · Max Simchowitz · Moritz Hardt -
2018 Oral: Delayed Impact of Fair Machine Learning »
Lydia T. Liu · Sarah Dean · Esther Rolf · Max Simchowitz · Moritz Hardt