Skip to yearly menu bar Skip to main content


Poster

On Interpolating Experts and Multi-Armed Bandits

Houshuang Chen · Yuchen He · Chihao Zhang

Hall C 4-9 #1602

Abstract: Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector m=(m1,,mK)NK, an instance of m-MAB indicates that the arms are partitioned into K groups and the i-th group contains mi arms. Once an arm is pulled, the losses of all arms in the same group are observed. We prove tight minimax regret bounds for m-MAB and design an optimal PAC algorithm for its pure exploration version, m-BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of m-MAB is Θ(Tk=1Klog(mk+1)) and the minimum number of pulls for an (ε,0.05)-PAC algorithm of m-BAI is Θ(1ε2k=1Klog(mk+1)). Both our upper bounds and lower bounds for m-MAB can be extended to a more general setting, namely the bandit with graph feedback, in terms of the *clique cover* and related graph parameters. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.

Chat is not available.