Skip to yearly menu bar Skip to main content

Workshop: Workshop on Reinforcement Learning Theory

Invited Speaker: Bo Dai: Leveraging Non-uniformity in Policy Gradient

Bo Dai


Policy gradient is one of the state-of-the-art algorithm families in reinforcement learning, which has been proved to be globally convergent. Motivated by properties of the accumulated reward in MDP, we propose a non-uniform refinement of the smoothness (NS) and \L{}ojasiewicz condition (N\L{}). The new definitions inspire new geometry-aware first-order policy gradient that are able to converge to global optimality in linear rate while incurring less overhead than existing algorithms, e.g., natural/mirror policy gradient. Similarly, For GLM, we show that geometry-aware normalized gradient descent can also achieve a linear convergence rate in fitting generalized linear models. Experimental results are used to illustrate and complement the theoretical findings.