ICML Poster On Interpolating Experts and Multi-Armed Bandits

Poster

On Interpolating Experts and Multi-Armed Bandits

Houshuang Chen · Yuchen He · Chihao Zhang

Hall C 4-9 #1602

[ Abstract ] [ Paper PDF ]

[ Poster]

Abstract: Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector

m = (m_{1}, \dots, m_{K}) \in N^{K}

$\mathbf{m}=(m_1,\dots,m_K)\in \mathbb N^K$ , an instance of

m

$\mathbf m$ -MAB indicates that the arms are partitioned into

K

$K$ groups and the

i

$i$ -th group contains

m_{i}

$m_i$ arms. Once an arm is pulled, the losses of all arms in the same group are observed. We prove tight minimax regret bounds for

m

$\mathbf m$ -MAB and design an optimal PAC algorithm for its pure exploration version,

m

$\mathbf m$ -BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of

m

$\mathbf m$ -MAB is

Θ (\sqrt{T \sum_{k = 1}^{K} \log (m_{k} + 1)})

$\Theta\left(\sqrt{T\sum_{k=1}^K\log (m_k+1)}\right)$ and the minimum number of pulls for an

(ε, 0.05)

$(\varepsilon,0.05)$ -PAC algorithm of

m

$\mathbf m$ -BAI is

Θ (\frac{1}{ε^{2}} \cdot \sum_{k = 1}^{K} \log (m_{k} + 1))

$\Theta\left(\frac{1}{\varepsilon^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$ . Both our upper bounds and lower bounds for

m

$\mathbf m$ -MAB can be extended to a more general setting, namely the bandit with graph feedback, in terms of the *clique cover* and related graph parameters. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.

Chat is not available.