ICML Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem

Poster
in
Workshop: Humans, Algorithmic Decision-Making and Society: Modeling Interactions and Impact

Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem

Avrim Blum · Kavya Ravichandran

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract: We give nearly-tight upper and lower bounds for the improving multi-armed bandits problem. An instance of this problem has

k

$k$ arms, each of whose reward function is a concave and increasing function of the number of times that arm has been pulled so far. This models decision-making scenarios where performance at a task improves with practice, but the performance curves are unknown to the agent a priori. We show that for any randomized online algorithm, there exists an instance on which it must suffer at least an

Ω (\sqrt{k})

$\Omega(\sqrt{k})$ approximation factor relative to the optimal reward. We then provide a randomized online algorithm that guarantees an

O (\sqrt{k})

$O(\sqrt{k})$ approximation factor, if it is told the maximum reward achievable by the optimal arm in advance. We then show how to remove this assumption at the cost of an extra

O (\log k)

$O(\log k)$ approximation factor, achieving an overall

O (\sqrt{k} \log k)

$O(\sqrt{k} \log k)$ approximation.

Chat is not available.

Poster in Workshop: Humans, Algorithmic Decision-Making and Society: Modeling Interactions and Impact

Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem

Avrim Blum · Kavya Ravichandran

Poster
in
Workshop: Humans, Algorithmic Decision-Making and Society: Modeling Interactions and Impact