We introduce the bilinear bandit problem with lowrank structure in which an action takes the form of a pair of arms from two different entity types, and the reward is a bilinear function of the known feature vectors of the arms. The problem is motivated by numerous applications in which the learner must recommend two different entity types as a single action, such as a male / female pair in an online dating service. The unknown in the problem is a $d1$ by $d2$ matrix $\mathbf{\Theta}^$ that defines the reward, and has low rank $r \ll \min{d_1,d_2}$. Determination of $\mathbf{\Theta}^$ with this lowrank structure poses a significant challenge in finding the right explorationexploitation tradeoff. In this work, we propose a new twostage algorithm called ExploreSubspaceThenRefine'' (ESTR). The first stage is an explicit subspace exploration, while the second stage is a linear bandit algorithm called
almostlowdimensional OFUL'' (LowOFUL) that exploits and further refines the estimated subspace via a regularization technique. We show that the regret of ESTR is $\widetilde{\mathcal{O}}((d1+d2)^{3/2} \sqrt{r T})$ where $\widetilde{\mathcal{O}}$ hides logarithmic factors and $T$ is the time horizon. This improves upon the regret of $\widetilde{\mathcal{O}}(d1d2\sqrt{T})$ attained for a na\"ive linear bandit reduction. We conjecture that the regret bound of ESTR is unimprovable up to polylogarithmic factors. A preliminary experiment shows that ESTR outperforms a na\"ive linear bandit reduction.
Author Information
KwangSung Jun (Boston University)
Rebecca Willett (U Chicago)
Stephen Wright (University of WisconsinMadison)
Robert Nowak (University of WisconsionMadison)
Robert Nowak holds the Nosbusch Professorship in Engineering at the University of WisconsinMadison, where his research focuses on signal processing, machine learning, optimization, and statistics.
Related Events (a corresponding poster, oral, or spotlight)

2019 Poster: Bilinear Bandits with Lowrank Structure »
Wed Jun 12th 06:30  09:00 PM Room Pacific Ballroom
More from the Same Authors

2019 Poster: FirstOrder Algorithms Converge Faster than $O(1/k)$ on Convex Problems »
Chingpei Lee · Stephen Wright 
2019 Oral: FirstOrder Algorithms Converge Faster than $O(1/k)$ on Convex Problems »
Chingpei Lee · Stephen Wright 
2019 Poster: Blended Conditonal Gradients »
Gábor Braun · Sebastian Pokutta · Dan Tu · Stephen Wright 
2019 Oral: Blended Conditonal Gradients »
Gábor Braun · Sebastian Pokutta · Dan Tu · Stephen Wright 
2019 Tutorial: Active Learning: From Theory to Practice »
Robert Nowak · Steve Hanneke 
2018 Poster: Dissipativity Theory for Accelerating Stochastic Variance Reduction: A Unified Analysis of SVRG and Katyusha Using Semidefinite Programs »
Bin Hu · Stephen Wright · Laurent Lessard 
2018 Oral: Dissipativity Theory for Accelerating Stochastic Variance Reduction: A Unified Analysis of SVRG and Katyusha Using Semidefinite Programs »
Bin Hu · Stephen Wright · Laurent Lessard 
2017 Poster: Algebraic Variety Models for HighRank Matrix Completion »
Greg Ongie · Laura Balzano · Rebecca Willett · Robert Nowak 
2017 Talk: Algebraic Variety Models for HighRank Matrix Completion »
Greg Ongie · Laura Balzano · Rebecca Willett · Robert Nowak