Timezone: »

On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits
Weitong Zhang · Jiafan He · Zhiyuan Fan · Quanquan Gu

Tue Jul 25 02:00 PM -- 04:30 PM (PDT) @ Exhibit Hall 1 #627
We study linear contextual bandits in the misspecified setting, where the expected reward function can be approximated by a linear function class up to a bounded misspecification level $\zeta>0$. We propose an algorithm based on a novel data selection scheme, which only selects the contextual vectors with large uncertainty for online regression. We show that, when the misspecification level $\zeta$ is dominated by $\tilde O(\Delta / \sqrt{d})$ with $\Delta$ being the minimal sub-optimality gap and $d$ being the dimension of the contextual vectors, our algorithm enjoys the same gap-dependent regret bound $\tilde O ({d^2} /{\Delta})$ as in the well-specified setting up to logarithmic factors. Given this result, we show that the existing SupLinUCB algorithm (Chu et al., 2011) can also achieve a gap-dependent constant regret bound without the knowledge of sub-optimality gap $\Delta$. Together with a lower bound adapted from Lattimore et al. (2020), our result suggests an interplay between the misspecification level and the sub-optimality gap: (1) the linear contextual bandit model is efficiently learnable when $\zeta \leq \tilde O({\Delta} / \sqrt{d})$; and (2) it is not efficiently learnable when $\zeta \geq \tilde \Omega({\Delta} / {\sqrt{d}})$. Experiments on both synthetic and real-world datasets corroborate our theoretical results.

Author Information

Weitong Zhang (University of California, Los Angeles)
Jiafan He (University of California, Los Angeles)
Zhiyuan Fan (Tsinghua University)
Quanquan Gu (University of California, Los Angeles)

More from the Same Authors