Skip to yearly menu bar Skip to main content


Poster

The Batch Complexity of Bandit Pure Exploration

Adrienne Tuynman · Rémy Degenne

West Exhibition Hall B2-B3 #W-917
[ ] [ ]
Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

In a fixed-confidence pure exploration problem in stochastic multi-armed bandits, an algorithm iteratively samples arms and should stop as early as possible and return the correct answer to a query about the arms distributions.We are interested in batched methods, which change their sampling behaviour only a few times, between batches of observations.We give an instance-dependent lower bound on the number of batches used by any sample efficient algorithm for any pure exploration task.We then give a general batched algorithm and prove upper bounds on its expected sample complexity and batch complexity.We illustrate both lower and upper bounds on best-arm identification and thresholding bandits.

Lay Summary:

With access to a set of options, we want to answer a question about these options: in a clinical trial setting, those questions could be: which treatment is the best? Which is efficient enough when compared to an efficiency threshold? To answer the question, we can sample the options. To quickly learn, it can be a good idea to adapt to past observations, to for example not resample a very bad option if we are looking for the best one. However, adapting at every single time step is computationally expensive and not always practically doable. How, then, does one balance learning speed and the number of adaptivity rounds (how many times one has to recompute and reevaluate)?We look at how many rounds of adaptivity are necessary for a given learning speed. We also give a general algorithm that can answer any possible question with guarantees on its speed and its number of adaptivity rounds.

Live content is unavailable. Log in and register to view live content