Oral

Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging

Ping-Chun Hsieh ⋅ Xi Liu ⋅ Anirban Bhattacharya ⋅ P R Kumar

2019 Oral

[ Slides] [ Video]

Abstract

Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection. In these applications, a ``reneging'' phenomenon, where participants may disengage from future interactions after observing an unsatisfiable outcome, is rather prevalent. To address the above issue, this paper proposes a model of heteroscedastic linear bandits with reneging. The model allows each participant to have a distinct ``satisfaction level," with any interaction outcome falling short of that level resulting in that participant reneging. Moreover, it allows the variance of the outcome to be context-dependent. Based on this model, we develop a UCB-type policy, called HR-UCB, and prove that with high probability it achieves $\mathcal{O}\Big(\sqrt{{T}}\big(\log({T})\big)^{3/2}\Big)$ regret. Finally, we validate the performance of HR-UCB via simulations.

Chat is not available.