Skip to yearly menu bar Skip to main content


Oral

Stochastic Variance-Reduced Policy Gradient

Matteo Papini · Damiano Binaghi · Giuseppe Canonaco · Matteo Pirotta · Marcello Restelli

Abstract:
    In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs).        Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning.        However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process.        The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate.        Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes.        Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.

Chat is not available.