Timezone: »

Hessian Aided Policy Gradient
Zebang Shen · Alejandro Ribeiro · Hamed Hassani · Hui Qian · Chao Mi

Wed Jun 12 06:30 PM -- 09:00 PM (PDT) @ Pacific Ballroom #114
Reducing the variance of estimators for policy gradient has long been the focus of reinforcement learning research. While classic algorithms like REINFORCE find an $\epsilon$-approximate first-order stationary point in $\OM({1}/{\epsilon^4})$ random trajectory simulations, no provable improvement on the complexity has been made so far. This paper presents a Hessian aided policy gradient method with the first improved sample complexity of $\OM({1}/{\epsilon^3})$. While our method exploits information from the policy Hessian, it can be implemented in linear time with respect to the parameter dimension and is hence applicable to sophisticated DNN parameterization. Simulations on standard tasks validate the efficiency of our method.

Author Information

Zebang Shen (Zhejiang University)
Alejandro Ribeiro (University of Pennsylvania)
Hamed Hassani (University of Pennsylvania)
Hamed Hassani

I am an assistant professor in the Department of Electrical and Systems Engineering (as of July 2017). I hold a secondary appointment in the Department of Computer and Information Systems. I am also a faculty affiliate of the Warren Center for Network and Data Sciences. Before joining Penn, I was a research fellow at the Simons Institute, UC Berkeley (program: Foundations of Machine Learning). Prior to that, I was a post-doctoral scholar and lecturer in the Institute for Machine Learning at ETH Z├╝rich. I received my Ph.D. degree in Computer and Communication Sciences from EPFL.

Hui Qian (Zhejiang University)
Chao Mi (Zhejiang University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors