Skip to yearly menu bar Skip to main content


Poster

Best Arm Identification in Linear Bandits with Linear Dimension Dependency

Chao Tao · Saúl A. Blanco · Yuan Zhou

Hall B #98

Abstract: We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown $d$-dimensional parameter vector $\theta$, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized $\theta$ estimator based on the solution to the convex relaxation of an optimal $G$-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension $d$, as well as an algorithm with sample complexity dependent on the reward gaps of the best $d$ arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.

Live content is unavailable. Log in and register to view live content