Timezone: »

Stochastic Rising Bandits for Online Model Selection
Alberto Maria Metelli · Francesco Trovò · Matteo Pirola · Marcello Restelli
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested and restless bandits in which the arms’ expected payoff is monotonically non-decreasing. This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds. We design an algorithm for the rested case (R-ed-UCB) and one for the restless case (R-less-UCB), providing a regret bound depending on the properties of the instance and, under certain circumstances, of $\mathcal{O}(T^{2/3})$. We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset. Finally, using synthetic and real-world data, we illustrate the effectiveness of the proposed approaches compared with state-of-the-art algorithms for the non-stationary bandits.

Author Information

Alberto Maria Metelli (Politecnico di Milano)
Francesco Trovò (Politecnico di Milano)
Matteo Pirola (Politecnico di Milano)
Marcello Restelli (Politecnico di Milano)

More from the Same Authors