Prior Diffusiveness and Regret in the Linear-Gaussian Bandit
Yifan Zhu ⋅ John Duchi ⋅ Benjamin Van Roy
Abstract
We prove that Thompson sampling exhibits $\tilde{O}(\sigma d \sqrt{T} + d r \sqrt{\mathrm{Tr}(\Sigma_0)})$ Bayesian regret in the linear-Gaussian bandit with a $\mathcal{N}(\mu_0, \Sigma_0)$ prior distribution on the coefficients, where $d$ is the dimension, $T$ is the time horizon, $r$ is the maximum $\ell_2$ norm of the actions, and $\sigma^2$ is the noise variance. In contrast to existing regret bounds, this shows that to within logarithmic factors, the prior-dependent ''burn-in'' term $d r \sqrt{\mathrm{Tr}(\Sigma_0)}$ decouples additively from the minimax (long run) regret \sigma d \sqrt{T}. Previous regret bounds exhibit a multiplicative dependence on these terms. We establish these results via a new ''elliptical potential'' lemma, and also provide a lower bound indicating that the burn-in term is unavoidable.
Successful Page Load