Timezone: »
Poster
NearOptimal ModelFree Reinforcement Learning in NonStationary Episodic MDPs
Weichao Mao · Kaiqing Zhang · Ruihao Zhu · David SimchiLevi · Tamer Basar
We consider modelfree reinforcement learning (RL) in nonstationary Markov decision processes. Both the reward functions and the state transition functions are allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain variation budgets. We propose Restarted QLearning with Upper Confidence Bounds (RestartQUCB), the first modelfree algorithm for nonstationary RL, and show that it outperforms existing solutions in terms of dynamic regret. Specifically, RestartQUCB with Freedmantype bonus terms achieves a dynamic regret bound of $\widetilde{O}(S^{\frac{1}{3}} A^{\frac{1}{3}} \Delta^{\frac{1}{3}} H T^{\frac{2}{3}})$, where $S$ and $A$ are the numbers of states and actions, respectively, $\Delta>0$ is the variation budget, $H$ is the number of time steps per episode, and $T$ is the total number of time steps. We further show that our algorithm is \emph{nearly optimal} by establishing an informationtheoretical lower bound of $\Omega(S^{\frac{1}{3}} A^{\frac{1}{3}} \Delta^{\frac{1}{3}} H^{\frac{2}{3}} T^{\frac{2}{3}})$, the first lower bound in nonstationary RL. Numerical experiments validate the advantages of RestartQUCB in terms of both cumulative rewards and computational efficiency. We further demonstrate the power of our results in the context of multiagent RL, where nonstationarity is a key challenge.
Author Information
Weichao Mao (University of Illinois at UrbanaChampaign)
Kaiqing Zhang (MIT)
Ruihao Zhu (MIT)
David SimchiLevi (MIT)
Tamer Basar (University of Illinois at UrbanaChampaign)
Related Events (a corresponding poster, oral, or spotlight)

2021 Spotlight: NearOptimal ModelFree Reinforcement Learning in NonStationary Episodic MDPs »
Thu Jul 22nd 02:20  02:25 AM Room None
More from the Same Authors

2021 Poster: Dynamic Planning and Learning under Recovering Rewards »
David SimchiLevi · Zeyu Zheng · Feng Zhu 
2021 Spotlight: Dynamic Planning and Learning under Recovering Rewards »
David SimchiLevi · Zeyu Zheng · Feng Zhu 
2021 Poster: Reinforcement Learning for CostAware Markov Decision Processes »
Wesley Suttle · Kaiqing Zhang · Zhuoran Yang · Ji Liu · David N Kraemer 
2021 Spotlight: Reinforcement Learning for CostAware Markov Decision Processes »
Wesley Suttle · Kaiqing Zhang · Zhuoran Yang · Ji Liu · David N Kraemer 
2020 Poster: Reinforcement Learning for NonStationary Markov Decision Processes: The Blessing of (More) Optimism »
Wang Chi Cheung · David SimchiLevi · Ruihao Zhu 
2020 Poster: Online Pricing with Offline Data: Phase Transition and Inverse Square Law »
Jinzhi Bu · David SimchiLevi · Yunzong Xu 
2018 Poster: Fully Decentralized MultiAgent Reinforcement Learning with Networked Agents »
Kaiqing Zhang · Zhuoran Yang · Han Liu · Tong Zhang · Tamer Basar 
2018 Oral: Fully Decentralized MultiAgent Reinforcement Learning with Networked Agents »
Kaiqing Zhang · Zhuoran Yang · Han Liu · Tong Zhang · Tamer Basar