Timezone: »

Invited Speaker: Bo Dai: Leveraging Non-uniformity in Policy Gradient
Bo Dai

Sat Jul 24 02:00 PM -- 02:25 PM (PDT) @

Policy gradient is one of the state-of-the-art algorithm families in reinforcement learning, which has been proved to be globally convergent. Motivated by properties of the accumulated reward in MDP, we propose a non-uniform refinement of the smoothness (NS) and \L{}ojasiewicz condition (N\L{}). The new definitions inspire new geometry-aware first-order policy gradient that are able to converge to global optimality in linear rate while incurring less overhead than existing algorithms, e.g., natural/mirror policy gradient. Similarly, For GLM, we show that geometry-aware normalized gradient descent can also achieve a linear convergence rate in fitting generalized linear models. Experimental results are used to illustrate and complement the theoretical findings.

Author Information

Bo Dai (Google Brain)

More from the Same Authors