Abstract:
The linear contextual bandit literature is mostly focused on the design of efficient learning algorithms for a given representation.
However, a contextual bandit problem may admit multiple linear representations, each one with different characteristics that directly impact the regret of the learning algorithm. In particular, recent works showed that there exist ``good'' representations for which constant problem-dependent regret can be achieved.
In this paper, we first provide a systematic analysis of the different definitions of ``good'' representations proposed in the literature. We then propose a novel selection algorithm able to adapt to the best representation in a set of $M$ candidates. We show that the regret is indeed never worse than the regret obtained by running \textsc{LinUCB} on best representation (up to a $\ln M$ factor). As a result, our algorithm achieves constant regret if a ``good'' representation is available in the set. Furthermore, we show the algorithm may still achieve constant regret by implicitly constructing a ``good'' representation, even when none of the initial representations is ``good''. Finally, we validate our theoretical findings in a number of standard contextual bandit problems.
Chat is not available.
Successful Page Load