Generalized Linear Bandits with Memory
Heesang Ann ⋅ Hyun-jun Choi ⋅ Taehyun Hwang ⋅ Younghoon Shin ⋅ Haeju Cheong ⋅ Min-hwan Oh
Abstract
We study generalized linear bandits with memory, a non-stationary setting in which rewards depend on past actions through a finite memory matrix. Building on prior work for linear models Clerici et al.,(2024), we show that the previously known $\tilde{\mathcal{O}}(T^{3/4})$ regret stems from a loose analysis based on cyclic proxy policies, and we refine the analysis to recover a $\tilde{\mathcal{O}}(\sqrt{T})$ regret rate in the linear case. We then extend this improvement to generalized linear models and propose a block-wise algorithm based on shrinkage-based confidence bounds. Our algorithm achieves a regret bound of $\tilde{\mathcal{O}}(\sqrt{mT}+ d\sqrt{T}+\sqrt{\kappa} d^{2} m^{1/4} T^{1/4} + \kappa d^{2})$, where $d$ denotes the feature dimension, $m$ the memory length, and $\kappa$ a curvature parameter of the link function, thereby attaining a $\sqrt{T}$ rate despite nonlinear rewards and memory effects. To the best of our knowledge, this analysis provides a unified treatment of memory-induced non-stationarity and nonlinear link functions, while ensuring that the leading regret term is independent of the curvature of the link function. We conduct numerical experiments that are consistent with our theoretical findings.
Successful Page Load