Timezone: »
Poster
Naive Exploration is Optimal for Online LQR
Max Simchowitz · Dylan Foster
Thu Jul 16 12:00 PM -- 12:45 PM & Thu Jul 16 11:00 PM -- 11:45 PM (PDT) @
We consider the problem of online adaptive control of the linear quadratic regulator, where the true system parameters are unknown. We prove new upper and lower bounds demonstrating that the optimal regret scales as $\tilde{\Theta ({\sqrt{d_{\mathbf{u}}^2 d_{\mathbf{x}} T}})$, where $T$ is the number of time steps, $d_{\mathbf{u}}$ is the dimension of the input space, and $d_{\mathbf{x}}$ is the dimension of the system state. Notably, our lower bounds rule out the possibility of a $\mathrm{poly}(\log{}T)$-regret algorithm, which had been conjectured due to the apparent strong convexity of the problem. Our upper bound is attained by a simple variant of certainty equivalent control, where the learner selects control inputs according to the optimal controller for their estimate of the system while injecting exploratory random noise. While this approach was shown to achieve $\sqrt{T}$ regret by Mania et al. (2019), we show that if the learner continually refines their estimates of the system matrices, the method attains optimal dimension dependence as well.
Central to our upper and lower bounds is a new approach for controlling perturbations of Riccati equations called the self-bounding ODE method, which we use to derive suboptimality bounds for the certainty equivalent controller synthesized from estimated system dynamics. This in turn enables regret upper bounds which hold for any stabilizable instance and scale with natural control-theoretic quantities.
Author Information
Max Simchowitz (UC Berkeley)
Dylan Foster (MIT)
More from the Same Authors
-
2022 : Interaction-Grounded Learning with Action-inclusive Feedback »
Tengyang Xie · Akanksha Saran · Dylan Foster · Lekan Molu · Ida Momennejad · Nan Jiang · Paul Mineiro · John Langford -
2023 Poster: Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL »
Zakaria Mhammedi · Dylan Foster · Alexander Rakhlin -
2023 Oral: Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL »
Zakaria Mhammedi · Dylan Foster · Alexander Rakhlin -
2023 Poster: Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games »
Dylan Foster · Noah Golowich · Sham Kakade -
2022 Poster: Contextual Bandits with Large Action Spaces: Made Practical »
Yinglun Zhu · Dylan Foster · John Langford · Paul Mineiro -
2022 Spotlight: Contextual Bandits with Large Action Spaces: Made Practical »
Yinglun Zhu · Dylan Foster · John Langford · Paul Mineiro -
2022 : Q&A II »
Dylan Foster · Alexander Rakhlin -
2022 : Bridging Learning and Decision Making: Part II »
Dylan Foster -
2022 : Q&A »
Dylan Foster · Alexander Rakhlin -
2022 Tutorial: Bridging Learning and Decision Making »
Dylan Foster · Alexander Rakhlin -
2021 Poster: Task-Optimal Exploration in Linear Dynamical Systems »
Andrew Wagenmaker · Max Simchowitz · Kevin Jamieson -
2021 Oral: Task-Optimal Exploration in Linear Dynamical Systems »
Andrew Wagenmaker · Max Simchowitz · Kevin Jamieson -
2020 Workshop: Theoretical Foundations of Reinforcement Learning »
Emma Brunskill · Thodoris Lykouris · Max Simchowitz · Wen Sun · Mengdi Wang -
2020 Poster: Reward-Free Exploration for Reinforcement Learning »
Chi Jin · Akshay Krishnamurthy · Max Simchowitz · Tiancheng Yu -
2020 Poster: Improved Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance »
Blair Bilodeau · Dylan Foster · Daniel Roy -
2020 Poster: Balancing Competing Objectives with Noisy Data: Score-Based Classifiers for Welfare-Aware Machine Learning »
Esther Rolf · Max Simchowitz · Sarah Dean · Lydia T. Liu · Daniel Bjorkegren · Moritz Hardt · Joshua Blumenstock -
2020 Poster: Logarithmic Regret for Adversarial Online Control »
Dylan Foster · Max Simchowitz -
2020 Poster: Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles »
Dylan Foster · Alexander Rakhlin -
2019 Poster: Distributed Learning with Sublinear Communication »
Jayadev Acharya · Christopher De Sa · Dylan Foster · Karthik Sridharan -
2019 Oral: Distributed Learning with Sublinear Communication »
Jayadev Acharya · Christopher De Sa · Dylan Foster · Karthik Sridharan -
2019 Poster: The Implicit Fairness Criterion of Unconstrained Learning »
Lydia T. Liu · Max Simchowitz · Moritz Hardt -
2019 Oral: The Implicit Fairness Criterion of Unconstrained Learning »
Lydia T. Liu · Max Simchowitz · Moritz Hardt -
2018 Poster: Delayed Impact of Fair Machine Learning »
Lydia T. Liu · Sarah Dean · Esther Rolf · Max Simchowitz · Moritz Hardt -
2018 Poster: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Oral: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Oral: Delayed Impact of Fair Machine Learning »
Lydia T. Liu · Sarah Dean · Esther Rolf · Max Simchowitz · Moritz Hardt