Timezone: »
In this work we study the safe sequential decision making problem under the setting of adversarial contextual bandits with sequential risk constraints. At each round, nature prepares a context, a cost for each arm, and additionally a risk for each arm. The learner leverages the context to pull an arm and receives the corresponding cost and risk associated with the pulled arm. In addition to minimizing the cumulative cost, for safety purposes, the learner needs to make safe decisions such that the average of the cumulative risk from all pulled arms should not be larger than a pre-defined threshold. To address this problem, we first study online convex programming in the full information setting where in each round the learner receives an adversarial convex loss and a convex constraint. We develop a meta algorithm leveraging online mirror descent for the full information setting and then extend it to contextual bandit with sequential risk constraints setting using expert advice. Our algorithms can achieve near-optimal regret in terms of minimizing the total cost, while successfully maintaining a sub- linear growth of accumulative risk constraint violation. We support our theoretical results by demonstrating our algorithm on a simple simulated robotics reactive control task.
Author Information
Wen Sun (Carnegie Mellon University)
Debadeepta Dey (Microsoft)
Ashish Kapoor (Microsoft Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Talk: Safety-Aware Algorithms for Adversarial Contextual Bandit »
Mon. Aug 7th 06:24 -- 06:42 AM Room C4.1
More from the Same Authors
-
2021 : Ranking Architectures by Feature Extraction Capabilities »
Debadeepta Dey · Shital Shah · Sebastien Bubeck -
2021 : Ranking Architectures by their Feature Extraction Capabilities »
Debadeepta Dey -
2021 Poster: Quantum algorithms for reinforcement learning with a generative model »
Daochen Wang · Aarthi Sundaram · Robin Kothari · Ashish Kapoor · Martin Roetteler -
2021 Spotlight: Quantum algorithms for reinforcement learning with a generative model »
Daochen Wang · Aarthi Sundaram · Robin Kothari · Ashish Kapoor · Martin Roetteler -
2021 Poster: Boosting the Throughput and Accelerator Utilization of Specialized CNN Inference Beyond Increasing Batch Size »
Jack Kosaian · Amar Phanishayee · Matthai Philipose · Debadeepta Dey · Rashmi Vinayak -
2021 Spotlight: Boosting the Throughput and Accelerator Utilization of Specialized CNN Inference Beyond Increasing Batch Size »
Jack Kosaian · Amar Phanishayee · Matthai Philipose · Debadeepta Dey · Rashmi Vinayak -
2019 : Poster Session 1 (all papers) »
Matilde Gargiani · Yochai Zur · Chaim Baskin · Evgenii Zheltonozhskii · Liam Li · Ameet Talwalkar · Xuedong Shang · Harkirat Singh Behl · Atilim Gunes Baydin · Ivo Couckuyt · Tom Dhaene · Chieh Lin · Wei Wei · Min Sun · Orchid Majumder · Michele Donini · Yoshihiko Ozaki · Ryan P. Adams · Christian Geißler · Ping Luo · zhanglin peng · · Ruimao Zhang · John Langford · Rich Caruana · Debadeepta Dey · Charles Weill · Xavi Gonzalvo · Scott Yang · Scott Yak · Eugen Hotaj · Vladimir Macko · Mehryar Mohri · Corinna Cortes · Stefan Webb · Jonathan Chen · Martin Jankowiak · Noah Goodman · Aaron Klein · Frank Hutter · Mojan Javaheripi · Mohammad Samragh · Sungbin Lim · Taesup Kim · SUNGWOONG KIM · Michael Volpp · Iddo Drori · Yamuna Krishnamurthy · Kyunghyun Cho · Stanislaw Jastrzebski · Quentin de Laroussilhe · Mingxing Tan · Xiao Ma · Neil Houlsby · Andrea Gesmundo · Zalán Borsos · Krzysztof Maziarz · Felipe Petroski Such · Joel Lehman · Kenneth Stanley · Jeff Clune · Pieter Gijsbers · Joaquin Vanschoren · Felix Mohr · Eyke Hüllermeier · Zheng Xiong · Wenpeng Zhang · Wenwu Zhu · Weijia Shao · Aleksandra Faust · Michal Valko · Michael Y Li · Hugo Jair Escalante · Marcel Wever · Andrey Khorlin · Tara Javidi · Anthony Francis · Saurajit Mukherjee · Jungtaek Kim · Michael McCourt · Saehoon Kim · Tackgeun You · Seungjin Choi · Nicolas Knudde · Alexander Tornede · Ghassen Jerfel -
2019 Poster: Provably Efficient Imitation Learning from Observation Alone »
Wen Sun · Anirudh Vemula · Byron Boots · Drew Bagnell -
2019 Oral: Provably Efficient Imitation Learning from Observation Alone »
Wen Sun · Anirudh Vemula · Byron Boots · Drew Bagnell -
2019 Poster: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daumé III · John Langford · Paul Mineiro -
2019 Oral: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daumé III · John Langford · Paul Mineiro -
2018 Poster: Recurrent Predictive State Policy Networks »
Ahmed Hefny · Zita Marinho · Wen Sun · Siddhartha Srinivasa · Geoff Gordon -
2018 Oral: Recurrent Predictive State Policy Networks »
Ahmed Hefny · Zita Marinho · Wen Sun · Siddhartha Srinivasa · Geoff Gordon -
2017 Poster: Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction »
Wen Sun · Arun Venkatraman · Geoff Gordon · Byron Boots · Drew Bagnell -
2017 Talk: Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction »
Wen Sun · Arun Venkatraman · Geoff Gordon · Byron Boots · Drew Bagnell