Invited Speaker
in
Workshop: Complex feedback in online learning
Delayed Feedback in Generalised Linear Bandits Revisited
Ciara Pike-Burke
In this talk, I will revisit the stochastic generalised linear bandit problem with stochastically delayed rewards. For this problem, we show that it is possible to obtain improved regret bounds by using a standard optimistic algorithm which only uses data from the rewards which have been received. In contrast to prior work, we show that no inflation of the confidence sets due to the delays is required. This leads to improving the regret guarantees from Õ(d \sqrt{T} + \sqrt{dT} E[𝛕]) to Õ(d \sqrt{T} + d^{3/2}E[𝛕]), where E[𝛕] denotes the expected delay, d is the dimension and T the time horizon and we have suppressed logarithmic terms. Thus our results decouple the impact of the horizon and delay, and more closely match what has been seen in the stochastically delayed K-armed bandit setting. We verify our theoretical results through experiments on simulated data. This is joint work with Ben Howson and Sarah Filippi.