Timezone: »
Poster
Smoothed Adversarial Linear Contextual Bandits with Knapsacks
Vidyashankar Sivakumar · Shiliang Zuo · Arindam Banerjee
Many bandit problems are characterized by the learner making decisions under constraints. The learner in Linear Contextual Bandits with Knapsacks (LinCBwK) receives a resource consumption vector in addition to a scalar reward in each time step which are both linear functions of the context corresponding to the chosen arm. For a fixed time horizon $T$, the goal of the learner is to maximize rewards while ensuring resource consumptions do not exceed a pre-specified budget. We present algorithms and characterize regret for LinCBwK in the smoothed setting where base context vectors are assumed to be perturbed by Gaussian noise. We consider both the stochastic and adversarial settings for the base contexts, and our analysis of stochastic LinCBwK can be viewed as a warm-up to the more challenging adversarial LinCBwK. For the stochastic setting, we obtain $O(\sqrt{T})$ additive regret bounds compared to the best context dependent fixed policy. The analysis combines ideas for greedy parameter estimation in \cite{kmrw18, siwb20} and the primal-dual paradigm first explored in \cite{agde17, agde14}. Our main contribution is an algorithm with $O(\log T)$ competitive ratio relative to the best context dependent fixed policy for the adversarial setting. The algorithm for the adversarial setting employs ideas from the primal-dual framework \cite{agde17, agde14} and a novel adaptation of the doubling trick \cite{isss19}.
Author Information
Vidyashankar Sivakumar (Amazon)
Shiliang Zuo (University of Illinois Urbana-Champaign)
Arindam Banerjee (UIUC)

Arindam Banerjee is a Founder Professor at the Department of Computer Science, University of Illinois Urbana-Champaign. His research interests are in machine learning. His current research focuses on computational and statistical aspects of over-parameterized models including deep learning, spatial and temporal data analysis, generative models, and sequential decision making problems. His work also focuses on applications of machine learning in complex real-world and scientific domains including problems in climate science and ecology. He has won several awards, including the NSF CAREER award, the IBM Faculty Award, and six best paper awards in top-tier venues.
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Smoothed Adversarial Linear Contextual Bandits with Knapsacks »
Tue. Jul 19th 02:50 -- 02:55 PM Room Room 309
More from the Same Authors
-
2022 Poster: Stability Based Generalization Bounds for Exponential Family Langevin Dynamics »
Arindam Banerjee · Tiancong Chen · Xinyan Li · Yingxue Zhou -
2022 Spotlight: Stability Based Generalization Bounds for Exponential Family Langevin Dynamics »
Arindam Banerjee · Tiancong Chen · Xinyan Li · Yingxue Zhou -
2020 Poster: Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis »
Vidyashankar Sivakumar · Steven Wu · Arindam Banerjee -
2017 Poster: High-Dimensional Structured Quantile Regression »
Vidyashankar Sivakumar · Arindam Banerjee -
2017 Poster: Robust Structured Estimation with Single-Index Models »
Sheng Chen · Arindam Banerjee -
2017 Talk: Robust Structured Estimation with Single-Index Models »
Sheng Chen · Arindam Banerjee -
2017 Talk: High-Dimensional Structured Quantile Regression »
Vidyashankar Sivakumar · Arindam Banerjee