Timezone: »

Dynamic Regret of Online Markov Decision Processes
Peng Zhao · Long-Fei Li · Zhi-Hua Zhou

Tue Jul 19 03:30 PM -- 05:30 PM (PDT) @ Hall E #800

We investigate online Markov Decision Processes~(MDPs) with adversarially changing loss functions and known transitions. We choose \emph{dynamic regret} as the performance measure, defined as the performance difference between the learner and any sequence of feasible \emph{changing} policies. The measure is strictly stronger than the standard static regret that benchmarks the learner's performance with a fixed compared policy. We consider three foundational models of online MDPs, including episodic loop-free Stochastic Shortest Path (SSP), episodic SSP, and infinite-horizon MDPs. For the three models, we propose novel online ensemble algorithms and establish their dynamic regret guarantees respectively, in which the results for episodic (loop-free) SSP are provably minimax optimal in terms of time horizon and certain non-stationarity measure.

Author Information

Peng Zhao (Nanjing University)
Long-Fei Li (Nanjing University)
Zhi-Hua Zhou (Nanjing University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors