Skip to yearly menu bar Skip to main content


Poster
in
Workshop: New Frontiers in Learning, Control, and Dynamical Systems

Accelerated Policy Gradient: On the Nesterov Momentum for Reinforcement Learning

Yen-Ju Chen · Nai-Chieh Huang · Ping-Chun Hsieh


Abstract: Policy gradient methods have recently been shown to enjoy global convergence at a $\Theta(1/t)$ rate in the non-regularized tabular softmax setting. Accordingly, one important research question is whether this convergence rate can be further improved, with only first-order updates. In this paper, we answer the above question from the perspective of momentum by adapting the celebrated Nesterov's accelerated gradient (NAG) method to reinforcement learning (RL), termed *Accelerated Policy Gradient* (APG). To demonstrate the potential of APG in achieving faster global convergence, we start from the bandit setting and formally show that with the true gradient, APG with softmax policy parametrization converges to an optimal policy at a $\tilde{O}(1/t^2)$ rate. To the best of our knowledge, this is the first characterization of the global convergence rate of NAG in the context of RL. Notably, our analysis relies on one interesting finding: Regardless of the initialization, APG could end up reaching a locally-concave regime, where APG could benefit significantly from the momentum, within finite iterations. By means of numerical validation, we confirm that APG exhibits $\tilde{O}(1/t^2)$ rate in the bandit setting and still preserves the $\tilde{O}(1/t^2)$ rate in various Markov decision process instances, showing that APG could significantly improve the convergence behavior over the standard policy gradient.

Chat is not available.