Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Foundations of Reinforcement Learning and Control: Connections and Perspectives

Learning to Explore with Lagrangians for Bandits under Unknown Constraints

Udvas Das · Debabrota Basu


Abstract:

Pure exploration in bandits can model eclectic real-world decision making problems, such as tuning hyper-parameters or conducting user studies, where sample frugality is desired. Thus, considering different safety, resource, and fairness constraints on the decision space has gained increasing attention. In this paper, we study generalisation of these problems as pure exploration in multi-armed bandits with unknown linear constraints. First, we propose a Lagrangian relaxation of the sample complexity lower bound for pure exploration. We further derive how this lower bound converges to the existing lower bound for pure exploration under known constraints, and how the hardness of the problem changes with the geometry induced by the constraint estimation procedure. We further leverage the Lagrangian lower bound and properties of convex optimisation to propose two computationally efficient extensions of Track-and-Stop and Gamified Explorations, namely LATS and LAGEX. Designing these algorithms require us to propose a new constraint-adaptive stopping rule, and also at each step, using pessimistic estimates of constraints in the Lagrangian lower bound. We show that these algorithms asymptotically achieve the desired sample complexity bounds. Finally, we conduct numerical experiments with different reward distributions and constraints that validate efficient performance of LAGEX and LATS with respect to baselines.

Chat is not available.