Skip to yearly menu bar Skip to main content

Workshop: Workshop on Reinforcement Learning Theory

Non-Stationary Representation Learning in Sequential Multi-Armed Bandits

Qin Yuzhen · Tommaso Menara · Samet Oymak · ShiNung Ching · Fabio Pasqualetti


Most of the existing theoretical studies on representation learning are focused on batch tasks. However, in practical decision-making scenarios, the learner often observes tasks in a sequential fashion. In such sequential problems, learning good representations becomes more challenging as the underlying task representation may change over time. In this paper, we address non-stationary representation learning in sequential multi-armed linear bandits. We introduce an online algorithm that is able to detect task switches and learn and transfer a non-stationary representation in an adaptive fashion. We derive a regret upper bound for our algorithm, which significantly outperforms the existing ones that do not learn the representation. Our bound provides theoretical insights into problem-dependent quantities and reveals the excess regret incurred by representation learning, non-stationarity, and task switch detection.

Chat is not available.