Timezone: »

 
Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation
Semih Cayci · Siddhartha Satpathi · Niao He · R Srikant

We study the dynamics of temporal difference learning with neural network-based value function approximation over a general state space, namely, \emph{Neural TD learning}. We consider two practically used algorithms, \emph{projection-free} and \emph{max-norm regularized} Neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms, both in terms of sample complexity and overparameterization. In particular, we prove that max-norm regularization achieves state-of-the-art sample complexity and overparameterization bounds by exploiting the geometry of the neural tangent kernel (NTK) function class. The results in this work rely on a novel Lyapunov drift analysis of the network parameters as a stopped and controlled random process.

Author Information

Semih Cayci (University of Illinois at Urbana-Champaign)
Siddhartha Satpathi (University of Illinois at Urbana Champaign)
Niao He (ETH Zurich)
R Srikant (UIUC)

More from the Same Authors