Timezone: »
We investigate a natural but surprisingly unstudied approach to the multi-armed bandit problem under safety risk constraints. Each arm is associated with an unknown law on safety risks and rewards, and the learner's goal is to maximise reward whilst not playing unsafe arms, as determined by a given threshold on the mean risk.We formulate a pseudo-regret for this setting that enforces this safety constraint in a per-round way by softly penalising any violation, regardless of the gain in reward due to the same. This has practical relevance to scenarios such as clinical trials, where one must maintain safety for each round rather than in an aggregated sense.We describe doubly optimistic strategies for this scenario, which maintain optimistic indices for both safety risk and reward. We show that schema based on both frequentist and Bayesian indices satisfy tight gap-dependent logarithmic regret bounds, and further that these play unsafe arms only logarithmically many times in total. This theoretical analysis is complemented by simulation studies demonstrating the effectiveness of the proposed schema, and probing the domains in which their use is appropriate.
Author Information
Tianrui Chen (Boston University)
Aditya Gangrade (Carnegie Mellon University)
Venkatesh Saligrama (Boston University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk »
Thu. Jul 21st through Fri the 22nd Room Hall E #1314
More from the Same Authors
-
2022 : Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk »
Tianrui Chen · Aditya Gangrade · Venkatesh Saligrama -
2022 : ActiveHedge: Hedge meets Active Learning »
Bhuvesh Kumar · Jacob Abernethy · Venkatesh Saligrama -
2022 : Acting Optimistically in Choosing Safe Actions »
Tianrui Chen · Aditya Gangrade · Venkatesh Saligrama -
2022 : ActiveHedge: Hedge meets Active Learning »
Bhuvesh Kumar · Jacob Abernethy · Venkatesh Saligrama -
2022 : Achieving High TinyML Accuracy through Selective Cloud Interactions »
Anil Kag · Igor Fedorov · Aditya Gangrade · Paul Whatmough · Venkatesh Saligrama -
2022 : FedHeN: Federated Learning in Heterogeneous Networks »
Durmus Alp Emre Acar · Venkatesh Saligrama -
2022 Workshop: The 1st Workshop on Healthcare AI and COVID-19 »
Peng Xu · Tingting Zhu · Pengkai Zhu · Tianrui Chen · David Clifton · Danielle Belgrave · Yuanting Zhang -
2022 Poster: Faster Algorithms for Learning Convex Functions »
Ali Siahkamari · Durmus Alp Emre Acar · Christopher Liao · Kelly Geyer · Venkatesh Saligrama · Brian Kulis -
2022 Poster: ActiveHedge: Hedge meets Active Learning »
Bhuvesh Kumar · Jacob Abernethy · Venkatesh Saligrama -
2022 Spotlight: ActiveHedge: Hedge meets Active Learning »
Bhuvesh Kumar · Jacob Abernethy · Venkatesh Saligrama -
2022 Spotlight: Faster Algorithms for Learning Convex Functions »
Ali Siahkamari · Durmus Alp Emre Acar · Christopher Liao · Kelly Geyer · Venkatesh Saligrama · Brian Kulis -
2021 Poster: Debiasing Model Updates for Improving Personalized Federated Training »
Durmus Alp Emre Acar · Yue Zhao · Ruizhao Zhu · Ramon Matas · Matthew Mattina · Paul Whatmough · Venkatesh Saligrama -
2021 Spotlight: Debiasing Model Updates for Improving Personalized Federated Training »
Durmus Alp Emre Acar · Yue Zhao · Ruizhao Zhu · Ramon Matas · Matthew Mattina · Paul Whatmough · Venkatesh Saligrama -
2021 Poster: Memory Efficient Online Meta Learning »
Durmus Alp Emre Acar · Ruizhao Zhu · Venkatesh Saligrama -
2021 Spotlight: Memory Efficient Online Meta Learning »
Durmus Alp Emre Acar · Ruizhao Zhu · Venkatesh Saligrama -
2021 Poster: Training Recurrent Neural Networks via Forward Propagation Through Time »
Anil Kag · Venkatesh Saligrama -
2021 Spotlight: Training Recurrent Neural Networks via Forward Propagation Through Time »
Anil Kag · Venkatesh Saligrama -
2020 Poster: Piecewise Linear Regression via a Difference of Convex Functions »
Ali Siahkamari · Aditya Gangrade · Brian Kulis · Venkatesh Saligrama -
2020 Poster: Minimax Rate for Learning From Pairwise Comparisons in the BTL Model »
Julien Hendrickx · Alex Olshevsky · Venkatesh Saligrama -
2019 Poster: Graph Resistance and Learning from Pairwise Comparisons »
Julien Hendrickx · Alex Olshevsky · Venkatesh Saligrama -
2019 Oral: Graph Resistance and Learning from Pairwise Comparisons »
Julien Hendrickx · Alex Olshevsky · Venkatesh Saligrama -
2019 Poster: Learning Classifiers for Target Domain with Limited or No Labels »
Pengkai Zhu · Hanxiao Wang · Venkatesh Saligrama -
2019 Oral: Learning Classifiers for Target Domain with Limited or No Labels »
Pengkai Zhu · Hanxiao Wang · Venkatesh Saligrama -
2018 Poster: Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers »
Yao Ma · Alex Olshevsky · Csaba Szepesvari · Venkatesh Saligrama -
2018 Oral: Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers »
Yao Ma · Alex Olshevsky · Csaba Szepesvari · Venkatesh Saligrama -
2017 Workshop: ML on a budget: IoT, Mobile and other tiny-ML applications »
Manik Varma · Venkatesh Saligrama · Prateek Jain -
2017 Poster: Adaptive Neural Networks for Efficient Inference »
Tolga Bolukbasi · Joseph Wang · Ofer Dekel · Venkatesh Saligrama -
2017 Talk: Adaptive Neural Networks for Efficient Inference »
Tolga Bolukbasi · Joseph Wang · Ofer Dekel · Venkatesh Saligrama -
2017 Poster: Connected Subgraph Detection with Mirror Descent on SDPs »
Cem Aksoylar · Orecchia Lorenzo · Venkatesh Saligrama -
2017 Talk: Connected Subgraph Detection with Mirror Descent on SDPs »
Cem Aksoylar · Orecchia Lorenzo · Venkatesh Saligrama