Poster
in
Workshop: Humans, Algorithmic Decision-Making and Society: Modeling Interactions and Impact
Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem
Avrim Blum · Kavya Ravichandran
Abstract:
We give nearly-tight upper and lower bounds for the improving multi-armed bandits problem. An instance of this problem has $k$ arms, each of whose reward function is a concave and increasing function of the number of times that arm has been pulled so far. This models decision-making scenarios where performance at a task improves with practice, but the performance curves are unknown to the agent a priori. We show that for any randomized online algorithm, there exists an instance on which it must suffer at least an $\Omega(\sqrt{k})$ approximation factor relative to the optimal reward. We then provide a randomized online algorithm that guarantees an $O(\sqrt{k})$ approximation factor, if it is told the maximum reward achievable by the optimal arm in advance. We then show how to remove this assumption at the cost of an extra $O(\log k)$ approximation factor, achieving an overall $O(\sqrt{k} \log k)$ approximation.
Chat is not available.