Timezone: »

Corralling a Band of Bandit Algorithms
Alekh Agarwal

Thu Aug 10 04:05 PM -- 04:45 PM (PDT) @

Ensemble techniques are well known as a mechanism to increase the adaptivity and robustness of machine learning methods. However, for interactive learners that drive data acquisition, building an ensemble presents novel challenges, since each learner might seek to collect different data. In this talk, we examine this question from the lens of contextual bandit learning. We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the best base algorithm if it were to be run on its own. We show how the task is hopeless in general, but present a new approach to achieve this goal under natural assumptions. As examples, we present two main applications. The first is to create an algorithm that enjoys worst-case robustness while at the same time performing much better when the environment is relatively easy. The second is to create an algorithm that works simultaneously under different assumptions of the environment, such as different priors or different loss structures.

Author Information

Alekh Agarwal (Microsoft Research)

More from the Same Authors