Leveraging Good Representations in Linear Contextual Bandits

Matteo Papini · Andrea Tirinzoni · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta

[ Abstract ] [ Livestream: Visit Bandits 5 ] [ Paper ]
Thu 22 Jul 8:50 p.m. — 8:55 p.m. PDT

The linear contextual bandit literature is mostly focused on the design of efficient learning algorithms for a given representation. However, a contextual bandit problem may admit multiple linear representations, each one with different characteristics that directly impact the regret of the learning algorithm. In particular, recent works showed that there exist ``good'' representations for which constant problem-dependent regret can be achieved. In this paper, we first provide a systematic analysis of the different definitions of ``good'' representations proposed in the literature. We then propose a novel selection algorithm able to adapt to the best representation in a set of $M$ candidates. We show that the regret is indeed never worse than the regret obtained by running \textsc{LinUCB} on best representation (up to a $\ln M$ factor). As a result, our algorithm achieves constant regret if a ``good'' representation is available in the set. Furthermore, we show the algorithm may still achieve constant regret by implicitly constructing a ``good'' representation, even when none of the initial representations is ``good''. Finally, we validate our theoretical findings in a number of standard contextual bandit problems.

Chat is not available.