Timezone: »

 
Poster
Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation
Shangtong Zhang · Bo Liu · Hengshuai Yao · Shimon Whiteson

Thu Jul 16 08:00 AM -- 08:45 AM & Thu Jul 16 08:00 PM -- 08:45 PM (PDT) @ Virtual #None

We present the first provably convergent two-timescale off-policy actor-critic algorithm (COF-PAC) with function approximation. Key to COF-PAC is the introduction of a new critic, the emphasis critic, which is trained via Gradient Emphasis Learning (GEM), a novel combination of the key ideas of Gradient Temporal Difference Learning and Emphatic Temporal Difference Learning. With the help of the emphasis critic and the canonical value function critic, we show convergence for COF-PAC, where the critics are linear and the actor can be nonlinear.

Author Information

Shangtong Zhang (University of Oxford)
Bo Liu (Auburn University)

Bo Liu is a tenure-track assistant professor in the Dept. of Computer Science and Software Engineering at Auburn University. He obtained his Ph.D. from Autonomous Learning Lab at University of Massachusetts Amherst, 2015, co-led by Drs. Sridhar Mahadevan and Andrew Barto. His primary research area covers decision-making under uncertainty, human-aided machine learning, symbolic AI, trustworthiness and interpretability in machine learning, and their numerous applications to BIGDATA, autonomous driving, and healthcare informatics. In his current research, he has more than 30 publications on several notable venues, such as NIPS, UAI, AAAI, IJCAI, AAMAS, JAIR, IEEE TNNLS, ACM TECS, etc. His research is funded by NSF, Amazon, Tencent (China), Adobe, and ETRI (South Korea). He is the recipient of the UAI'2015 Facebook best student paper award and the Amazon research award in 2018. His research results have been covered by many prestigious venues, including the classical textbook "Reinforcement Learning: An Introduction" (2nd edition), NIPS'2015/IJCAI'2016/AAAI'2019 tutorials.

Hengshuai Yao (Huawei Technologies)
Shimon Whiteson (University of Oxford)

More from the Same Authors