Timezone: »
Linear fixed point equations in Hilbert spaces naturally arise from the policy evaluation problem in reinforcement learning. We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. First, we prove an instance-dependent upper bound on the mean-squared error for a linear stochastic approximation scheme that exploits Polyak--Ruppert averaging. This bound consists of two terms: an approximation error term with an instance-dependent approximation factor, and a statistical error term that captures the instance-specific complexity of the noise when projected onto the low-dimensional subspace. Using information-theoretic methods, we also establish lower bounds showing that the approximation factor cannot be improved, again in an instance-dependent sense. A concrete consequence of our characterization is that the optimal approximation factor in this problem can be much larger than a universal constant. We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation, establishing their optimality.
Author Information
Wenlong Mou (UC Berkeley)
Ashwin Pananjady (Georgia Institute of Technology)
Martin Wainwright (UC Berkeley / Voleon)
More from the Same Authors
-
2021 : Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning »
Andrea Zanette · Martin Wainwright · Emma Brunskill -
2022 Poster: Stabilizing Q-learning with Linear Architectures for Provable Efficient Learning »
Andrea Zanette · Martin Wainwright -
2022 Poster: A new similarity measure for covariate shift with applications to nonparametric regression »
Reese Pathak · Cong Ma · Martin Wainwright -
2022 Spotlight: Stabilizing Q-learning with Linear Architectures for Provable Efficient Learning »
Andrea Zanette · Martin Wainwright -
2022 Oral: A new similarity measure for covariate shift with applications to nonparametric regression »
Reese Pathak · Cong Ma · Martin Wainwright -
2021 : Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning »
Andrea Zanette · Martin Wainwright · Emma Brunskill -
2018 Poster: Dropout Training, Data-dependent Regularization, and Generalization Bounds »
Wenlong Mou · Yuchen Zhou · Jun Gao · Liwei Wang -
2018 Oral: Dropout Training, Data-dependent Regularization, and Generalization Bounds »
Wenlong Mou · Yuchen Zhou · Jun Gao · Liwei Wang -
2017 Poster: Differentially Private Clustering in High-Dimensional Euclidean Spaces »
Nina Balcan · Travis Dick · Yingyu Liang · Wenlong Mou · Hongyang Zhang -
2017 Poster: Collect at Once, Use Effectively: Making Non-interactive Locally Private Learning Possible »
Kai Zheng · Wenlong Mou · Liwei Wang -
2017 Talk: Collect at Once, Use Effectively: Making Non-interactive Locally Private Learning Possible »
Kai Zheng · Wenlong Mou · Liwei Wang -
2017 Talk: Differentially Private Clustering in High-Dimensional Euclidean Spaces »
Nina Balcan · Travis Dick · Yingyu Liang · Wenlong Mou · Hongyang Zhang