A Reduction from Linear Contextual Bandits Lower Bounds to Estimations Lower Bounds

Jiahao He · Jiheng Zhang · Rachel Q. Zhang

Hall E #1113

Keywords: [ T: Probabilistic Methods ] [ T: Online Learning and Bandits ]


Linear contextual bandits and their variants are usually solved using algorithms guided by parameter estimation. Cauchy-Schwartz inequality established that estimation errors dominate algorithm regrets, and thus, accurate estimators suffice to guarantee algorithms with low regrets. In this paper, we complete the reverse direction by establishing the necessity. In particular, we provide a generic transformation from algorithms for linear contextual bandits to estimators for linear models, and show that algorithm regrets dominate estimation errors of their induced estimators, i.e., low-regret algorithms must imply accurate estimators. Moreover, our analysis reduces the regret lower bound to an estimation error, bridging the lower bound analysis in linear contextual bandit problems and linear regression.

Chat is not available.