Online and non-stochastic control

Elad Hazan · Karan Singh



In recent years new methods have emerged in control and reinforcement learning that incorporate techniques from regret minimization and online convex optimization. The resulting theory give rise to provable guarantees for some longstanding questions in control and reinforcement learning: logarithmic regret and fast rates, end-to-end LQG-LQR without system knowledge, Kalman filtering with adversarial noise, black-box control with provable finite-time guarantees, tight lower bounds for system identification, and more.
The main innovation in these results stems from an online control model which replaces stochastic perturbations by adversarial ones, and the goal of optimal control with regret minimization. We will describe the setting, as well as novel methods that are gradient-based and rely on novel convex relaxations.

Chat is not available.