ICML Optimal Dynamic Regret in LQR Control

Poster
in
Workshop: Responsible Decision Making in Dynamic Environments

Optimal Dynamic Regret in LQR Control

Dheeraj Baby · Yu-Xiang Wang

[ Abstract ]

Abstract: We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control. We provide an efficient online algorithm that achieves an optimal dynamic (policy) regret of

\tilde{O} (n^{1 / 3} \TV (M_{1 : n}^{2 / 3} \lor 1)

$\tilde{O}(n^{1/3} \TV(M_{1:n}^{2/3} \vee 1)$ , where

\TV (M_{1 : n})

$\TV(M_{1:n})$ is the total variation of any oracle sequence of \emph{Disturbance Action} policies parameterized by

M_{1}, . . ., M_{n}

$M_1,...,M_n$ --- chosen in hindsight to cater to unknown nonstationarity. The rate improves the best known rate of

\tilde{O} (\sqrt{n (\TV (M_{1 : n}) + 1)})

$\tilde{O}(\sqrt{n (\TV(M_{1:n})+1)} )$ for general convex losses and is information-theoretically optimal for LQR. Main technical components include the reduction of LQR to online linear regression with delayed feedback due to Foster and Simchowitz 2020, as well as a new \emph{proper} learning algorithm with an optimal

\tilde{O} (n^{1 / 3})

$\tilde{O}(n^{1/3})$ dynamic regret on a family of

minibatched'' quadratic losses, which could be of independent interest.

Chat is not available.

Poster in Workshop: Responsible Decision Making in Dynamic Environments

Optimal Dynamic Regret in LQR Control

Dheeraj Baby · Yu-Xiang Wang

Poster
in
Workshop: Responsible Decision Making in Dynamic Environments