Poster
in
Workshop: New Frontiers in Learning, Control, and Dynamical Systems
Balancing exploration and exploitation in Partially Observed Linear Contextual Bandits via Thompson Sampling
Hongju Park · Mohamad Kazem Shirani Faradonbeh
Contextual bandits constitute a popular framework for studying the exploration-exploitation trade-off under finitely many options with side information. In the majority of the existing works, contexts are assumed perfectly observed, while in practice it is more reasonable to assume that they are observed partially. In this work, we study reinforcement learning algorithms for contextual bandits with partial observations. First, we consider different structures for partial observability and their corresponding optimal policies. Subsequently, we present and analyze reinforcement learning algorithms for partially observed contextual bandits with noisy linear observation structures. For these algorithms that utilize Thompson sampling, we establish estimation accuracy and regret bounds under different structural assumptions.