Poster
in
Workshop: Foundations of Reinforcement Learning and Control: Connections and Perspectives
Optimistic Information Directed Sampling
Gergely Neu · Matteo Papini · Ludovic Schwartz
We study the problem of online learning in contextual bandit problems wherethe loss function is assumed to belong to a known parametric function class.We propose a new analytic framework for this setting that bridges the Bayesiantheory of information-directed sampling due to Russo and Van Roy [2018] andthe worst-case theory of Foster, Kakade, Qian, and Rakhlin [2021] based on thedecision-estimation coefficient. Drawing from both lines of work, we propose aalgorithmic template called Optimistic Information-Directed Sampling and showthat it can achieve instance-dependent regret guarantees similar to the ones achiev-able by the classic Bayesian IDS method, but with the major advantage of notrequiring any Bayesian assumptions. The key technical innovation of our analysisis introducing an optimistic surrogate model for the regret and using it to definea frequentist version of the Information Ratio of Russo and Van Roy [2018], anda less conservative version of the Decision Estimation Coefficient of Foster et al.[2021].