Timezone: »
We study efficient algorithms for reinforcement learning in Markov decision processes, whose complexity is independent of the number of states. This formulation succinctly captures large scale problems, but is also known to be computationally hard in its general form. We consider the methodology of boosting, borrowed from supervised learning, for converting weak learners into an accurate policy. The notion of weak learning we study is that of sampled-based approximate optimization of linear functions over policies. We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods, till global optimality is reached. We prove sample complexity and running time bounds on our method, that are polynomial in the natural parameters of the problem: approximation guarantee, discount factor, distribution mismatch and number of actions. In particular, our bound does not explicitly depend on the number of states.
A technical difficulty in applying previous boosting results, is that the value function over policy space is not convex. We show how to use a non-convex variant of the Frank-Wolfe method, coupled with recent advances in gradient boosting that allow incorporating a weak learner with multiplicative approximation guarantee, to overcome the non-convexity and attain global convergence.
Author Information
Nataly Brukhim (Princeton University)
Elad Hazan (Princeton University)
Karan Singh (Microsoft Research)
More from the Same Authors
-
2021 : Robust online control with model misspecification »
Xinyi Chen · Udaya Ghai · Elad Hazan · Alexandre Megretsky -
2022 : Non-convex online learning via algorithmic equivalence »
Udaya Ghai · Zhou Lu · Elad Hazan -
2022 Poster: A Regret Minimization Approach to Multi-Agent Control »
Udaya Ghai · Udari Madhuhshani · Naomi Leonard · Elad Hazan -
2022 Spotlight: A Regret Minimization Approach to Multi-Agent Control »
Udaya Ghai · Udari Madhuhshani · Naomi Leonard · Elad Hazan -
2021 Poster: Boosting for Online Convex Optimization »
Elad Hazan · Karan Singh -
2021 Spotlight: Boosting for Online Convex Optimization »
Elad Hazan · Karan Singh -
2021 Poster: A Regret Minimization Approach to Iterative Learning Control »
Naman Agarwal · Elad Hazan · Anirudha Majumdar · Karan Singh -
2021 Spotlight: A Regret Minimization Approach to Iterative Learning Control »
Naman Agarwal · Elad Hazan · Anirudha Majumdar · Karan Singh -
2021 : Online and non-stochastic control »
Karan Singh -
2021 Tutorial: Online and non-stochastic control »
Elad Hazan · Karan Singh -
2021 : Online and non-stochastic control »
Elad Hazan -
2020 Poster: Boosting for Control of Dynamical Systems »
Naman Agarwal · Nataly Brukhim · Elad Hazan · Zhou Lu -
2019 Poster: Efficient Full-Matrix Adaptive Regularization »
Naman Agarwal · Brian Bullins · Xinyi Chen · Elad Hazan · Karan Singh · Cyril Zhang · Yi Zhang -
2019 Poster: Online Control with Adversarial Disturbances »
Naman Agarwal · Brian Bullins · Elad Hazan · Sham Kakade · Karan Singh -
2019 Oral: Efficient Full-Matrix Adaptive Regularization »
Naman Agarwal · Brian Bullins · Xinyi Chen · Elad Hazan · Karan Singh · Cyril Zhang · Yi Zhang -
2019 Oral: Online Control with Adversarial Disturbances »
Naman Agarwal · Brian Bullins · Elad Hazan · Sham Kakade · Karan Singh -
2019 Poster: Provably Efficient Maximum Entropy Exploration »
Elad Hazan · Sham Kakade · Karan Singh · Abby Van Soest -
2019 Oral: Provably Efficient Maximum Entropy Exploration »
Elad Hazan · Sham Kakade · Karan Singh · Abby Van Soest -
2018 Poster: On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization »
Sanjeev Arora · Nadav Cohen · Elad Hazan -
2018 Oral: On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization »
Sanjeev Arora · Nadav Cohen · Elad Hazan -
2017 Poster: Efficient Regret Minimization in Non-Convex Games »
Elad Hazan · Karan Singh · Cyril Zhang -
2017 Talk: Efficient Regret Minimization in Non-Convex Games »
Elad Hazan · Karan Singh · Cyril Zhang