Timezone: »
Linear quadratic regulator (LQR) is one of the most popular frameworks to tackle continuous Markov decision process tasks. With its fundamental theory and tractable optimal policy, LQR has been revisited and analyzed in recent years, in terms of reinforcement learning scenarios such as the model-free or model-based setting. In this paper, we introduce the Structured Policy Iteration (S-PI) for LQR, a method capable of deriving a structured linear policy. Such a structured policy with (block) sparsity or low-rank can have significant advantages over the standard LQR policy: more interpretable, memory-efficient, and well-suited for the distributed setting. In order to derive such a policy, we first cast a regularized LQR problem when the model is known. Then, our Structured Policy Iteration (S-PI) algorithm, which takes a policy evaluation step and a policy improvement step in an iterative manner, can solve this regularized LQR efficiently. We further extend the S-PI algorithm to the model-free setting where a smoothing procedure is adopted to estimate the gradient. In both the known-model and model-free setting, we prove convergence analysis under the proper choice of parameters. Finally, the experiments demonstrate the advantages of S-PI in terms of balancing the LQR performance and level of structure by varying the weight parameter.
Author Information
Youngsuk Park (Stanford University)
Ryan A. Rossi (Adobe Research)
Zheng Wen (DeepMind)
Gang Wu (Adobe Research)
Handong Zhao (Adobe Research)
More from the Same Authors
-
2022 Poster: One-Pass Algorithms for MAP Inference of Nonsymmetric Determinantal Point Processes »
Aravind Reddy · Ryan A. Rossi · Zhao Song · Anup Rao · Tung Mai · Nedim Lipka · Gang Wu · Eunyee Koh · Nesreen K Ahmed -
2022 Spotlight: One-Pass Algorithms for MAP Inference of Nonsymmetric Determinantal Point Processes »
Aravind Reddy · Ryan A. Rossi · Zhao Song · Anup Rao · Tung Mai · Nedim Lipka · Gang Wu · Eunyee Koh · Nesreen K Ahmed -
2021 Poster: Asymptotics of Ridge Regression in Convolutional Models »
Mojtaba Sahraee-Ardakan · Tung Mai · Anup Rao · Ryan A. Rossi · Sundeep Rangan · Alyson Fletcher -
2021 Spotlight: Asymptotics of Ridge Regression in Convolutional Models »
Mojtaba Sahraee-Ardakan · Tung Mai · Anup Rao · Ryan A. Rossi · Sundeep Rangan · Alyson Fletcher -
2021 Poster: Joint Online Learning and Decision-making via Dual Mirror Descent »
Alfonso Lobos Ruiz · Paul Grigas · Zheng Wen -
2021 Spotlight: Joint Online Learning and Decision-making via Dual Mirror Descent »
Alfonso Lobos Ruiz · Paul Grigas · Zheng Wen -
2021 Poster: Fundamental Tradeoffs in Distributionally Adversarial Training »
Mohammad Mehrabi · Adel Javanmard · Ryan A. Rossi · Anup Rao · Tung Mai -
2021 Spotlight: Fundamental Tradeoffs in Distributionally Adversarial Training »
Mohammad Mehrabi · Adel Javanmard · Ryan A. Rossi · Anup Rao · Tung Mai -
2020 Poster: Budgeted Online Influence Maximization »
Pierre Perrault · Jennifer Healey · Zheng Wen · Michal Valko -
2020 Poster: Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems »
Tong Yu · Branislav Kveton · Zheng Wen · Ruiyi Zhang · Ole J. Mengshoel