ICML 2023 Delayed Feedback in Kernel Bandits Oral

Oral

Delayed Feedback in Kernel Bandits

Sattar Vakili · Danyal Ahmed · Alberto Bernacchia · Ciara Pike-Burke

Exhibit Hall 2

[ Abstract ] [ Visit Oral A1 Causal Learning, RL, Personalization ]

[ PDF]

Abstract: Black box optimisation of an unknown function from expensive and noisy evaluations is a ubiquitous problem in machine learning, academic research and industrial production. An abstraction of the problem can be formulated as a kernel based bandit problem (also known as Bayesian optimisation), where a learner aims at optimising a kernelized function through sequential noisy observations. The existing work predominantly assumes feedback is immediately available; an assumption which fails in many real world situations, including recommendation systems, clinical trials and hyperparameter tuning. We consider a kernel bandit problem under stochastically delayed feedback, and propose an algorithm with

\tilde{O} (\sqrt{Γ_{k} (T) T} + E [τ])

$\tilde{\mathcal{O}}\left(\sqrt{\Gamma_k(T) T}+\mathbb{E}[\tau]\right)$ regret, where

T

$T$ is the number of time steps,

Γ_{k} (T)

$\Gamma_k(T)$ is the maximum information gain of the kernel with

T

$T$ observations, and

τ

$\tau$ is the delay random variable. This represents a significant improvement over the state of the art regret bound of

\tilde{O} (Γ_{k} (T) \sqrt{T} + E [τ] Γ_{k} (T))

$\tilde{\mathcal{O}}\left(\Gamma_k(T)\sqrt{ T}+\mathbb{E}[\tau]\Gamma_k(T)\right)$ reported in (Verma et al., 2022). In particular, for very non-smooth kernels, the information gain grows almost linearly in time, trivializing the existing results. We also validate our theoretical results with simulations.

Chat is not available.