Timezone: »
We investigate online Markov Decision Processes~(MDPs) with adversarially changing loss functions and known transitions. We choose \emph{dynamic regret} as the performance measure, defined as the performance difference between the learner and any sequence of feasible \emph{changing} policies. The measure is strictly stronger than the standard static regret that benchmarks the learner's performance with a fixed compared policy. We consider three foundational models of online MDPs, including episodic loop-free Stochastic Shortest Path (SSP), episodic SSP, and infinite-horizon MDPs. For the three models, we propose novel online ensemble algorithms and establish their dynamic regret guarantees respectively, in which the results for episodic (loop-free) SSP are provably minimax optimal in terms of time horizon and certain non-stationarity measure.
Author Information
Peng Zhao (Nanjing University)
Long-Fei Li (Nanjing University)
Zhi-Hua Zhou (Nanjing University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Dynamic Regret of Online Markov Decision Processes »
Tue. Jul 19th through Wed the 20th Room Hall E #800
More from the Same Authors
-
2022 : Optimal Rates of (Locally) Differentially Private Heavy-tailed Multi-Armed Bandits »
Yulian Wu · Youming Tao · Peng Zhao · Di Wang -
2023 Poster: Identifying Useful Learnwares for Heterogeneous Label Spaces »
Lan-Zhe Guo · Zhi Zhou · Yu-Feng Li · Zhi-Hua Zhou -
2023 Poster: Fast Rates in Time-Varying Strongly Monotone Games »
Yu-Hu Yan · Peng Zhao · Zhi-Hua Zhou -
2023 Poster: Optimistic Online Mirror Descent for Bridging Stochastic and Adversarial Online Convex Optimization »
SIJIA CHEN · Wei-Wei Tu · Peng Zhao · Lijun Zhang -
2023 Poster: Estimating Possible Causal Effects with Latent Variables via Adjustment »
Tian-Zuo Wang · Tian Qin · Zhi-Hua Zhou -
2022 Poster: No-Regret Learning in Time-Varying Zero-Sum Games »
Mengxiao Zhang · Peng Zhao · Haipeng Luo · Zhi-Hua Zhou -
2022 Spotlight: No-Regret Learning in Time-Varying Zero-Sum Games »
Mengxiao Zhang · Peng Zhao · Haipeng Luo · Zhi-Hua Zhou -
2021 Poster: Budgeted Heterogeneous Treatment Effect Estimation »
Tian Qin · Tian-Zuo Wang · Zhi-Hua Zhou -
2021 Spotlight: Budgeted Heterogeneous Treatment Effect Estimation »
Tian Qin · Tian-Zuo Wang · Zhi-Hua Zhou -
2020 Poster: Cost-effectively Identifying Causal Effects When Only Response Variable is Observable »
Tian-Zuo Wang · Xi-Zhu Wu · Sheng-Jun Huang · Zhi-Hua Zhou -
2020 Poster: Learning with Feature and Distribution Evolvable Streams »
Zhen-Yu Zhang · Peng Zhao · Yuan Jiang · Zhi-Hua Zhou -
2019 Poster: Adaptive Regret of Convex and Smooth Functions »
Lijun Zhang · Tie-Yan Liu · Zhi-Hua Zhou -
2019 Oral: Adaptive Regret of Convex and Smooth Functions »
Lijun Zhang · Tie-Yan Liu · Zhi-Hua Zhou -
2019 Poster: Heterogeneous Model Reuse via Optimizing Multiparty Multiclass Margin »
Xi-Zhu Wu · Song Liu · Zhi-Hua Zhou -
2019 Oral: Heterogeneous Model Reuse via Optimizing Multiparty Multiclass Margin »
Xi-Zhu Wu · Song Liu · Zhi-Hua Zhou -
2018 Poster: Rectify Heterogeneous Models with Semantic Mapping »
Han-Jia Ye · De-Chuan Zhan · Yuan Jiang · Zhi-Hua Zhou -
2018 Poster: Dynamic Regret of Strongly Adaptive Methods »
Lijun Zhang · Tianbao Yang · rong jin · Zhi-Hua Zhou -
2018 Oral: Rectify Heterogeneous Models with Semantic Mapping »
Han-Jia Ye · De-Chuan Zhan · Yuan Jiang · Zhi-Hua Zhou -
2018 Oral: Dynamic Regret of Strongly Adaptive Methods »
Lijun Zhang · Tianbao Yang · rong jin · Zhi-Hua Zhou -
2017 Poster: A Unified View of Multi-Label Performance Measures »
Xi-Zhu Wu · Zhi-Hua Zhou -
2017 Talk: A Unified View of Multi-Label Performance Measures »
Xi-Zhu Wu · Zhi-Hua Zhou