ICML Poster Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards

Poster

Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards

Shiyin Lu · Guanghui Wang · Yao Hu · Lijun Zhang

Pacific Ballroom #121

Keywords: [ Bandits ] [ Online Learning ]

[ Abstract ]

Abstract: We study Lipschitz bandits, where a learner repeatedly plays one arm from an infinite arm set and then receives a stochastic reward whose expectation is a Lipschitz function of the chosen arm. Most of existing work assume the reward distributions are bounded or at least sub-Gaussian, and thus do not apply to heavy-tailed rewards arising in many real-world scenarios such as web advertising and financial markets. To address this limitation, in this paper we relax the assumption on rewards to allow arbitrary distributions that have finite

(1 + ϵ)

$(1+\epsilon)$ -th moments for some

ϵ \in (0, 1]

$\epsilon \in (0, 1]$ , and propose algorithms that enjoy a sublinear regret of

\tilde{O} (T^{(d_{z} ϵ + 1) / (d_{z} ϵ + ϵ + 1)})

$\widetilde{O}(T^{(d_z\epsilon + 1)/(d_z \epsilon + \epsilon + 1)})$ where

T

$T$ is the time horizon and

d_{z}

$d_z$ is the zooming dimension. The key idea is to exploit the Lipschitz property of the expected reward function by adaptively discretizing the arm set, and employ upper confidence bound policies with robust mean estimators designed for heavy-tailed distributions. Furthermore, we provide a lower bound for Lipschitz bandits with heavy-tailed rewards, and show that our algorithms are optimal in terms of

T

$T$ . Finally, we conduct numerical experiments to demonstrate the effectiveness of our algorithms.

Live content is unavailable. Log in and register to view live content