Timezone: »

Estimating Optimal Policy Value in Linear Contextual Bandits beyond Gaussianity
Jonathan Lee · Weihao Kong · Aldo Pacchiano · Vidya Muthukumar · Emma Brunskill
While much of bandit learning focuses on minimizing regret while learning an optimal policy, it is often of interest to estimate the maximum value achievable before learning the optimal policy, which can be of use as an input to downstream tasks like model selection. Prior work in contextual bandits has considered this in the Gaussian setting. Here, we study the problem of approximating the optimal policy value in the more general linear contextual bandit problem, and we focus on whether it is possible to do so with less data than what is needed to learn the optimal policy. We consider two objectives: (1) estimating upper bounds on the value and (2) estimating the value directly. For the first, we present an adaptive upper bound that is at most logarithmic factor larger than the value and tight when the data is Gaussian and show that it is possible to estimate this upper bound in $\widetilde{\mathcal{O}}( \sqrt{d} )$ samples where $d$ is the number of parameters. As a consequence of this bound, we show improved regret bounds for model selection. For the second objective, we present a moment-based algorithm for estimating the optimal policy value with sample complexity $\widetilde{ \mathcal{O}}( \sqrt{d} )$ for sub-Gaussian context distributions whose low order moments are known.

Author Information

Jonathan Lee (Stanford University)
Weihao Kong (University of Washington)
Aldo Pacchiano (UC Berkeley)
Vidya Muthukumar (Georgia Institute of Technology)
Emma Brunskill (Stanford University)
Emma Brunskill

Emma Brunskill is an associate tenured professor in the Computer Science Department at Stanford University. Brunskill’s lab aims to create AI systems that learn from few samples to robustly make good decisions and is part of the Stanford AI Lab, the Stanford Statistical ML group, and AI Safety @Stanford. Brunskill has received a NSF CAREER award, Office of Naval Research Young Investigator Award, a Microsoft Faculty Fellow award and an alumni impact award from the computer science and engineering department at the University of Washington. Brunskill and her lab have received multiple best paper nominations and awards both for their AI and machine learning work (UAI best paper, Reinforcement Learning and Decision Making Symposium best paper twice) and for their work in Ai of education (Intelligent Tutoring Systems Conference, Educational Data Mining conference x3, CHI).

More from the Same Authors