Timezone: »

Bilinear Bandits with Low-rank Structure
Kwang-Sung Jun · Rebecca Willett · Stephen Wright · Robert Nowak

Wed Jun 12 12:00 PM -- 12:05 PM (PDT) @ Seaside Ballroom
We introduce the bilinear bandit problem with low-rank structure in which an action takes the form of a pair of arms from two different entity types, and the reward is a bilinear function of the known feature vectors of the arms. The problem is motivated by numerous applications in which the learner must recommend two different entity types as a single action, such as a male / female pair in an online dating service. The unknown in the problem is a $d_1$ by $d_2$ matrix $\mathbf{\Theta}^*$ that defines the reward, and has low rank $r \ll \min\{d_1,d_2\}$. Determination of $\mathbf{\Theta}^*$ with this low-rank structure poses a significant challenge in finding the right exploration-exploitation tradeoff. In this work, we propose a new two-stage algorithm called ``Explore-Subspace-Then-Refine'' (ESTR). The first stage is an explicit subspace exploration, while the second stage is a linear bandit algorithm called ``almost-low-dimensional OFUL'' (LowOFUL) that exploits and further refines the estimated subspace via a regularization technique. We show that the regret of ESTR is $\widetilde{\mathcal{O}}((d_1+d_2)^{3/2} \sqrt{r T})$ where $\widetilde{\mathcal{O}}$ hides logarithmic factors and $T$ is the time horizon. This improves upon the regret of $\widetilde{\mathcal{O}}(d_1d_2\sqrt{T})$ attained for a na\"ive linear bandit reduction. We conjecture that the regret bound of ESTR is unimprovable up to polylogarithmic factors. A preliminary experiment shows that ESTR outperforms a na\"ive linear bandit reduction.

Author Information

Kwang-Sung Jun (Boston University)
Rebecca Willett (U Chicago)
Stephen Wright (University of Wisconsin-Madison)
Robert Nowak (University of Wisconsion-Madison)
Robert Nowak

Robert Nowak holds the Nosbusch Professorship in Engineering at the University of Wisconsin-Madison, where his research focuses on signal processing, machine learning, optimization, and statistics.

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors