Poster
Contextual Linear Bandits with Delay as Payoff
Mengxiao Zhang · Yingfei Wang · Haipeng Luo
West Exhibition Hall B2-B3 #W-920
Sequential decision-making applications often need to deal with delayed action outcomes: think of medical treatments and online advertising, where the payoff is not immediate and its delay is proportional to the payoff. A recent study [Schlisselberg et al. 2025] explored this problem but only for very basic multi-armed bandits scenarios. In our work, we tackle its contextual linear variant, where each action's payoff depends on its time-varying feature embedding, making the problem more realistic.To address this, we introduce a novel and efficient algorithm with strong theoretical and empirical guarantees. Contrary to standard linear bandit algorithms that construct least squares estimators and confidence ellipsoids, the main novelty of our algorithm is its phased arm elimination procedure by only selecting the volumetric spanners of the action set. This approach effectively addresses challenges arising from both payoff-dependent delays and large action sets.Our research initiates the study of contextual linear bandits with payoff-dependent delays, opening doors to more complicated real-world scenarios, including evolving and composite delayed feedback.
Live content is unavailable. Log in and register to view live content