Timezone: »

 
Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity
Kaiqing Zhang · Xiangyuan Zhang · Bin Hu · Tamer Basar

Policy-based model-free reinforcement learning (RL) methods have shown great promise for continuous control applications. However, their performances on risk-sensitive/robust control tasks have not been fully understood, which has been generally considered to be one important open problem in the seminal work (Fazel et al., 2018). We make a step toward addressing this open problem, by providing the first sample complexity results for policy gradient (PG) methods in two fundamental risk-sensitive/robust control settings: the linear exponential quadratic Gaussian, and the linear-quadratic (LQ) disturbance attenuation problems. The optimization landscapes for these problems are by nature more challenging than that of the LQ regulator problem, due to lack of coercivity of their objective functions. To overcome this challenge, we obtain the first \emph{implicit regularization} results for model-free PG methods, certifying that the controller \emph{remains robust} during the learning process, which further lead to the sample complexity guarantees. As a by-product, our results also provide the first sample complexity of PG methods in two-player zero-sum LQ dynamic games, a baseline in multi-agent RL.

Author Information

Kaiqing Zhang (University of Illinois at Urbana-Champaign/MIT)
Xiangyuan Zhang (University of Illinois at Urbana-Champaign)
Bin Hu (University of Illinois at Urbana-Champaign)
Tamer Basar (University of Illinois at Urbana-Champaign)

More from the Same Authors