Timezone: »

Smoothed Adversarial Linear Contextual Bandits with Knapsacks
Vidyashankar Sivakumar · Shiliang Zuo · Arindam Banerjee

Tue Jul 19 07:50 AM -- 07:55 AM (PDT) @ Room 309
Many bandit problems are characterized by the learner making decisions under constraints. The learner in Linear Contextual Bandits with Knapsacks (LinCBwK) receives a resource consumption vector in addition to a scalar reward in each time step which are both linear functions of the context corresponding to the chosen arm. For a fixed time horizon $T$, the goal of the learner is to maximize rewards while ensuring resource consumptions do not exceed a pre-specified budget. We present algorithms and characterize regret for LinCBwK in the smoothed setting where base context vectors are assumed to be perturbed by Gaussian noise. We consider both the stochastic and adversarial settings for the base contexts, and our analysis of stochastic LinCBwK can be viewed as a warm-up to the more challenging adversarial LinCBwK. For the stochastic setting, we obtain $O(\sqrt{T})$ additive regret bounds compared to the best context dependent fixed policy. The analysis combines ideas for greedy parameter estimation in \cite{kmrw18, siwb20} and the primal-dual paradigm first explored in \cite{agde17, agde14}. Our main contribution is an algorithm with $O(\log T)$ competitive ratio relative to the best context dependent fixed policy for the adversarial setting. The algorithm for the adversarial setting employs ideas from the primal-dual framework \cite{agde17, agde14} and a novel adaptation of the doubling trick \cite{isss19}.

Author Information

Vidyashankar Sivakumar (Amazon)
Shiliang Zuo (University of Illinois Urbana-Champaign)
Arindam Banerjee (UIUC)
Arindam Banerjee

Arindam Banerjee is a Founder Professor at the Department of Computer Science, University of Illinois Urbana-Champaign. His research interests are in machine learning. His current research focuses on computational and statistical aspects of over-parameterized models including deep learning, spatial and temporal data analysis, generative models, and sequential decision making problems. His work also focuses on applications of machine learning in complex real-world and scientific domains including problems in climate science and ecology. He has won several awards, including the NSF CAREER award, the IBM Faculty Award, and six best paper awards in top-tier venues.

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors