Skip to yearly menu bar Skip to main content


Poster

Bilinear Bandits with Low-rank Structure

Kwang-Sung Jun · Rebecca Willett · Stephen Wright · Robert Nowak

Pacific Ballroom #127

Keywords: [ Bandits ]


Abstract: We introduce the bilinear bandit problem with low-rank structure in which an action takes the form of a pair of arms from two different entity types, and the reward is a bilinear function of the known feature vectors of the arms. The unknown in the problem is a d1 by d2 matrix Θ that defines the reward, and has low rank rmin{d1,d2}. Determination of Θ with this low-rank structure poses a significant challenge in finding the right exploration-exploitation tradeoff. In this work, we propose a new two-stage algorithm called Explore-Subspace-Then-Refine'' (ESTR). The first stage is an explicit subspace exploration, while the second stage is a linear bandit algorithm called almost-low-dimensional OFUL'' (LowOFUL) that exploits and further refines the estimated subspace via a regularization technique. We show that the regret of ESTR is O~((d1+d2)3/2rT) where O~ hides logarithmic factors and T is the time horizon, which improves upon the regret of O~(d1d2T) attained for a na\"ive linear bandit reduction. We conjecture that the regret bound of ESTR is unimprovable up to polylogarithmic factors, and our preliminary experiment shows that ESTR outperforms a na\"ive linear bandit reduction.

Live content is unavailable. Log in and register to view live content