Timezone: »
Poster
Dynamic Planning and Learning under Recovering Rewards
David Simchi-Levi · Zeyu Zheng · Feng Zhu
Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce a general class of multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from at most $K$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops after it is pulled, and then non-parametrically recovers as the idle time increases. With the objective of maximizing expected cumulative rewards over $T$ time periods, we propose, construct and prove performance guarantees for a class of ``Purely Periodic Policies''. For the offline problem when all model parameters are known, our proposed policy obtains an approximation ratio that is at the order of $1-\mathcal O(1/\sqrt{K})$, which is asymptotically optimal when $K$ grows to infinity. For the online problem when the model parameters are unknown and need to be learned, we design an Upper Confidence Bound (UCB) based policy that approximately has $\widetilde\mathcal O(N\sqrt{T})$ regret against the offline benchmark. Our framework and policy design may have the potential to be adapted into other offline planning and online learning applications with non-stationary and recovering rewards.
Author Information
David Simchi-Levi (MIT)
Zeyu Zheng (University of California, Berkeley)
Feng Zhu (Massachusetts Institute of Technology)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Dynamic Planning and Learning under Recovering Rewards »
Thu. Jul 22nd 12:20 -- 12:25 AM Room
More from the Same Authors
-
2023 Poster: Pricing Experimental Design: Causal Effect, Expected Revenue and Tail Risk »
David Simchi-Levi · Chonghuan Wang -
2021 Poster: Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs »
Weichao Mao · Kaiqing Zhang · Ruihao Zhu · David Simchi-Levi · Tamer Basar -
2021 Spotlight: Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs »
Weichao Mao · Kaiqing Zhang · Ruihao Zhu · David Simchi-Levi · Tamer Basar -
2020 Poster: When Demands Evolve Larger and Noisier: Learning and Earning in a Growing Environment »
Feng Zhu · Zeyu Zheng -
2020 Poster: Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism »
Wang Chi Cheung · David Simchi-Levi · Ruihao Zhu -
2020 Poster: Online Pricing with Offline Data: Phase Transition and Inverse Square Law »
Jinzhi Bu · David Simchi-Levi · Yunzong Xu