Timezone: »

Stochastic Rising Bandits
Alberto Maria Metelli · Francesco Trovò · Matteo Pirola · Marcello Restelli

Wed Jul 20 10:55 AM -- 11:00 AM (PDT) @ Room 327 - 329
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested and restless bandits in which the arms' expected payoff is monotonically non-decreasing. This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds. We design an algorithm for the rested case (R-ed-UCB) and one for the restless case (R-less-UCB), providing a regret bound depending on the properties of the instance and, under certain circumstances, of $\widetilde{\mathcal{O}}(T^{\frac{2}{3}})$. We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset. Finally, using synthetic and real-world data, we illustrate the effectiveness of the proposed approaches compared with state-of-the-art algorithms for the non-stationary bandits.

Author Information

Alberto Maria Metelli (Politecnico di Milano)
Francesco Trovò (Politecnico di Milano)
Matteo Pirola (Politecnico di Milano)
Marcello Restelli (Politecnico di Milano)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors