Timezone: »
Poster
Multiplier Bootstrap-based Exploration
Runzhe Wan · Haoyu Wei · Branislav Kveton · Rui Song
Despite the great interest in the bandit problem, designing efficient algorithms for complex models remains challenging, as there is typically no analytical way to quantify uncertainty. In this paper, we propose Multiplier Bootstrap-based Exploration (MBE), a novel exploration strategy that is applicable to any reward model amenable to weighted loss minimization. We prove both instance-dependent and instance-independent rate-optimal regret bounds for MBE in sub-Gaussian multi-armed bandits. With extensive simulation and real-data experiments, we show the generality and adaptivity of MBE.
Author Information
Runzhe Wan (Amazon)
Haoyu Wei
Branislav Kveton (AWS AI Labs)
Rui Song (Amazon Inc)
More from the Same Authors
-
2023 : Active Learning with Crowd Sourcing Improves Information Retrieval »
Zhuotong Chen · Yifei Ma · Branislav Kveton · Anoop Deoras -
2023 Workshop: The Many Facets of Preference-Based Learning »
Aadirupa Saha · Mohammad Ghavamzadeh · Robert Busa-Fekete · Branislav Kveton · Viktor Bengs -
2023 Poster: Thompson Sampling with Diffusion Generative Prior »
Yu-Guan Hsieh · Shiva Kasiviswanathan · Branislav Kveton · Patrick Bloebaum -
2023 Poster: Multi-Task Off-Policy Learning from Bandit Feedback »
Joey Hong · Branislav Kveton · Manzil Zaheer · Sumeet Katariya · Mohammad Ghavamzadeh -
2023 Poster: An Instrumental Variable Approach to Confounded Off-Policy Evaluation »
Yang Xu · Jin Zhu · Chengchun Shi · Shikai Luo · Rui Song -
2023 Poster: On Heterogeneous Treatment Effects in Heterogeneous Causal Graphs »
Richard Watson · Hengrui Cai · Xinming An · Samuel McLean · Rui Song -
2023 Poster: A Reinforcement Learning Framework for Dynamic Mediation Analysis »
Lin Ge · Jitao Wang · Chengchun Shi · Zhenke Wu · Rui Song -
2022 Poster: Safe Exploration for Efficient Policy Evaluation and Comparison »
Runzhe Wan · Branislav Kveton · Rui Song -
2022 Poster: Deep Hierarchy in Bandits »
Joey Hong · Branislav Kveton · Sumeet Katariya · Manzil Zaheer · Mohammad Ghavamzadeh -
2022 Spotlight: Deep Hierarchy in Bandits »
Joey Hong · Branislav Kveton · Sumeet Katariya · Manzil Zaheer · Mohammad Ghavamzadeh -
2022 Spotlight: Safe Exploration for Efficient Policy Evaluation and Comparison »
Runzhe Wan · Branislav Kveton · Rui Song -
2021 Poster: Meta-Thompson Sampling »
Branislav Kveton · Mikhail Konobeev · Manzil Zaheer · Chih-wei Hsu · Martin Mladenov · Craig Boutilier · Csaba Szepesvari -
2021 Spotlight: Meta-Thompson Sampling »
Branislav Kveton · Mikhail Konobeev · Manzil Zaheer · Chih-wei Hsu · Martin Mladenov · Craig Boutilier · Csaba Szepesvari -
2021 Poster: Deeply-Debiased Off-Policy Interval Estimation »
Chengchun Shi · Runzhe Wan · Victor Chernozhukov · Rui Song -
2021 Oral: Deeply-Debiased Off-Policy Interval Estimation »
Chengchun Shi · Runzhe Wan · Victor Chernozhukov · Rui Song -
2020 Poster: Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making »
Chengchun Shi · Runzhe Wan · Rui Song · Wenbin Lu · Ling Leng -
2020 Poster: Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems »
Tong Yu · Branislav Kveton · Zheng Wen · Ruiyi Zhang · Ole J. Mengshoel -
2019 Poster: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2019 Oral: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2017 Poster: Model-Independent Online Learning for Influence Maximization »
Sharan Vaswani · Branislav Kveton · Zheng Wen · Mohammad Ghavamzadeh · Laks V.S Lakshmanan · Mark Schmidt -
2017 Poster: Online Learning to Rank in Stochastic Click Models »
Masrour Zoghi · Tomas Tunys · Mohammad Ghavamzadeh · Branislav Kveton · Csaba Szepesvari · Zheng Wen -
2017 Talk: Online Learning to Rank in Stochastic Click Models »
Masrour Zoghi · Tomas Tunys · Mohammad Ghavamzadeh · Branislav Kveton · Csaba Szepesvari · Zheng Wen -
2017 Talk: Model-Independent Online Learning for Influence Maximization »
Sharan Vaswani · Branislav Kveton · Zheng Wen · Mohammad Ghavamzadeh · Laks V.S Lakshmanan · Mark Schmidt