Timezone: »
The Q-learning algorithm is a simple, fundamental and practically very effective reinforcement learning algorithm. However, the basic protocol can exhibit an unstable behavior when implemented even with simple linear function approximation. While tools like target networks and experience replayare often implemented to stabilize the learning process, the individual contribution of each of these mechanisms is not well understood theoretically.This work proposes an exploration variant of the basicQ-learning protocol with linear function approximation. Our modular analysis illustrates the role played by each algorithmic tool that we adopt:a second order update rule,a set of target networks, and a mechanism akin to experience replay.Together, they enable state of the art regret bounds on linear MDPs while preserving the most prominent feature of the algorithm, namely a space complexity independent of the number of steps elapsed. Furthermore, we show that the performance of the algorithm degrades very gracefully under a new, more permissive notion of approximation error. Finally, the algorithm partially inherits problem dependent regret bounds,function of the number of `effective' feature dimension.
Author Information
Andrea Zanette (University of California, Berkeley)
Martin Wainwright (UC Berkeley / Voleon)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Stabilizing Q-learning with Linear Architectures for Provable Efficient Learning »
Wed. Jul 20th through Thu the 21st Room Hall E #908
More from the Same Authors
-
2021 : Optimal and instance-dependent oracle inequalities for policy evaluation »
Wenlong Mou · Ashwin Pananjady · Martin Wainwright -
2021 : Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning »
Andrea Zanette · Martin Wainwright · Emma Brunskill -
2023 Poster: When is Realizability Sufficient for Off-Policy Reinforcement Learning? »
Andrea Zanette -
2022 Poster: A new similarity measure for covariate shift with applications to nonparametric regression »
Reese Pathak · Cong Ma · Martin Wainwright -
2022 Oral: A new similarity measure for covariate shift with applications to nonparametric regression »
Reese Pathak · Cong Ma · Martin Wainwright -
2021 : Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning »
Andrea Zanette · Martin Wainwright · Emma Brunskill