ICML 2022 A Reduction from Linear Contextual Bandits Lower Bounds to Estimations Lower Bounds Spotlight

Spotlight

A Reduction from Linear Contextual Bandits Lower Bounds to Estimations Lower Bounds

Jiahao He · Jiheng Zhang · Rachel Q. Zhang

Ballroom 3 & 4

[ Abstract ] [ Visit T: Bandits/Online Learning/Reinforcement Learning ]

[ Slides] [ Paper PDF]

Abstract:

Linear contextual bandits and their variants are usually solved using algorithms guided by parameter estimation. Cauchy-Schwartz inequality established that estimation errors dominate algorithm regrets, and thus, accurate estimators suffice to guarantee algorithms with low regrets. In this paper, we complete the reverse direction by establishing the necessity. In particular, we provide a generic transformation from algorithms for linear contextual bandits to estimators for linear models, and show that algorithm regrets dominate estimation errors of their induced estimators, i.e., low-regret algorithms must imply accurate estimators. Moreover, our analysis reduces the regret lower bound to an estimation error, bridging the lower bound analysis in linear contextual bandit problems and linear regression.

Chat is not available.