Timezone: »

Fast Convergence for Unstable Reinforcement Learning Problems by Logarithmic Mapping
Wang Zhang · Lam Nguyen · Subhro Das · Alexandre Megretsky · Luca Daniel · Tsui-Wei Weng
Event URL: https://openreview.net/forum?id=EVqdPBvYrvg »

For many of the reinforcement learning applications, the system is assumed to be inherently stable and with bounded reward, state and action space. These are key requirements for the optimization convergence of classical reinforcement learning reward function with discount factors. Unfortunately, these assumptions do not hold true for many real world problems such as an unstable linear–quadratic regulator (LQR). In this work, we propose new methods to stabilize and speed up the convergence of unstable reinforcement learning problems with the policy gradient methods. We provide theoretical insights on the efficiency of our methods. In practice, we achieve good experimental results over multiple examples where the vanilla methods mostly fail to converge due to system instability.

Author Information

Wang Zhang (MIT)
Lam Nguyen (IBM Research, Thomas J. Watson Research Center)
Subhro Das (MIT-IBM Watson AI Lab, IBM Research)

Subhro Das is a Research Staff Member and Manager at the MIT-IBM AI Lab, IBM Research, Cambridge MA. As a Principal Investigator (PI), he works on developing novel AI algorithms in collaboration with MIT. He is a Research Affiliate at MIT, co-leading IBM's engagement in the MIT Quest for Intelligence. He serves as the Chair of the AI Learning Professional Interest Community (PIC) at IBM Research. His research interests are broadly in the areas of Trustworthy ML, Reinforcement Learning and ML Optimization. At the MIT-IBM AI Lab, he works on developing novel AI algorithms for uncertainty quantification and human-centric AI systems; robust, accelerated, online & distributed optimization; and, safe, unstable & multi-agent reinforcement learning. He led the Future of Work initiative within IBM Research, studying the impact of AI on the labor market and developing AI-driven recommendation frameworks for skills and talent management. Previously, at the IBM T.J. Watson Research Center in New York, he worked on developing signal processing and machine learning based predictive algorithms for a broad variety of biomedical and healthcare applications. He received MS and PhD degrees in Electrical and Computer Engineering from Carnegie Mellon University in 2014 and 2016, respectively, and Bachelors (B.Tech.) degree in Electronics & Electrical Communication Engineering from Indian Institute of Technology Kharagpur in 2011.

Alexandre Megretsky (Massachusetts Institute of Technology)
Luca Daniel (Massachusetts Institute of Technology)
Tsui-Wei Weng (MIT)

More from the Same Authors