Timezone: »

 
Spotlight
Best Model Identification: A Rested Bandit Formulation
Leonardo Cella · Massimiliano Pontil · Claudio Gentile

Wed Jul 21 07:35 AM -- 07:40 AM (PDT) @ None

We introduce and analyze a best arm identification problem in the rested bandit setting, wherein arms are themselves learning algorithms whose expected losses decrease with the number of times the arm has been played. The shape of the expected loss functions is similar across arms, and is assumed to be available up to unknown parameters that have to be learned on the fly. We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game. We analyze an arm elimination algorithm whose regret vanishes as the time horizon increases. The actual rate of convergence depends in a detailed way on the postulated functional form of the expected losses. We complement our analysis with lower bounds, indicating strengths and limitations of the proposed solution.

Author Information

Leonardo Cella (Italian Institute of Technology)
Massimiliano Pontil (Istituto Italiano di Tecnologia and University College London)

-

Claudio Gentile (Google Research)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors