Timezone: »

Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach
Shuang Wu · Ling Shi · Jun Wang · Guangjian Tian

Thu Jul 21 03:00 PM -- 05:00 PM (PDT) @ Hall E #929

The REINFORCE algorithm \cite{williams1992simple} is popular in policy gradient (PG) for solving reinforcement learning (RL) problems. Meanwhile, the theoretical form of PG is from~\cite{sutton1999policy}. Although both formulae prescribe PG, their precise connections are not yet illustrated. Recently, \citeauthor{nota2020policy} (\citeyear{nota2020policy}) have found that the ambiguity causes implementation errors. Motivated by the ambiguity and implementation incorrectness, we study PG from a perturbation perspective. In particular, we derive PG in a unified framework, precisely clarify the relation between PG implementation and theory, and echos back the findings by \citeauthor{nota2020policy}. Diving into factors contributing to empirical successes of the existing erroneous implementations, we find that small approximation error and the experience replay mechanism play critical roles.

Author Information

Shuang Wu (Huawei Noah's Ark Lab)
Ling Shi (The Hong Kong University of Science and Technology)
Ling Shi

Dr. Ling Shi obtained his B.E. from EEE (now ECE), HKUST, in 2002 and Ph.D. from CDS, Caltech, in 2008. He is currently a Professor in the Department of Electronic and Computer Engineering, and the associate director of the Robotics Institute, both at the Hong Kong University of Science and Technology. His research interests include cyber-physical systems security, networked control systems, sensor scheduling, event-based state estimation,and exoskeleton robots. He is a senior member of IEEE. He served as an editorial board member for the European Control Conference 2013-2016. He was a subject editor for International Journal of Robust and Nonlinear Control (2015-2017), an associate editor for IEEE Transactions on Control of Network Systems (2016-2020), an associate editor for IEEE Control Systems Letters (2017-2020), and an associate editor for a special issue on Secure Control of Cyber Physical Systems in IEEE Transactions on Control of Network Systems (2015-2017). He also served as the General Chair of the 23rd International Symposium on Mathematical Theory of Networks and Systems (MTNS 2018). He is a member of the Young Scientists Class 2020 of the World Economic Forum (WEF).

Jun Wang (UCL)
Guangjian Tian (Huawei Noah’s Ark Lab)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors