Timezone: »
Poster
Safe Exploration for Efficient Policy Evaluation and Comparison
Runzhe Wan · Branislav Kveton · Rui Song
High-quality data plays a central role in ensuring the accuracy of policy evaluation. This paper initiates the study of efficient and safe data collection for bandit policy evaluation. We formulate the problem and investigate its several representative variants. For each variant, we analyze its statistical properties, derive the corresponding exploration policy, and design an efficient algorithm for computing it. Both theoretical analysis and experiments support the usefulness of the proposed methods.
Author Information
Runzhe Wan (Amazon)
Branislav Kveton (Google Research)
Rui Song (North Carolina State University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Safe Exploration for Efficient Policy Evaluation and Comparison »
Thu. Jul 21st 07:45 -- 07:50 PM Room Room 301 - 303
More from the Same Authors
-
2023 : Active Learning with Crowd Sourcing Improves Information Retrieval »
Zhuotong Chen · Yifei Ma · Branislav Kveton · Anoop Deoras -
2023 Workshop: The Many Facets of Preference-Based Learning »
Aadirupa Saha · Mohammad Ghavamzadeh · Robert Busa-Fekete · Branislav Kveton · Viktor Bengs -
2023 Poster: Thompson Sampling with Diffusion Generative Prior »
Yu-Guan Hsieh · Shiva Kasiviswanathan · Branislav Kveton · Patrick Bloebaum -
2023 Poster: Multiplier Bootstrap-based Exploration »
Runzhe Wan · Haoyu Wei · Branislav Kveton · Rui Song -
2023 Poster: Multi-Task Off-Policy Learning from Bandit Feedback »
Joey Hong · Branislav Kveton · Manzil Zaheer · Sumeet Katariya · Mohammad Ghavamzadeh -
2023 Poster: An Instrumental Variable Approach to Confounded Off-Policy Evaluation »
Yang Xu · Jin Zhu · Chengchun Shi · Shikai Luo · Rui Song -
2023 Poster: On Heterogeneous Treatment Effects in Heterogeneous Causal Graphs »
Richard Watson · Hengrui Cai · Xinming An · Samuel McLean · Rui Song -
2023 Poster: A Reinforcement Learning Framework for Dynamic Mediation Analysis »
Lin Ge · Jitao Wang · Chengchun Shi · Zhenke Wu · Rui Song -
2022 Poster: Deep Hierarchy in Bandits »
Joey Hong · Branislav Kveton · Sumeet Katariya · Manzil Zaheer · Mohammad Ghavamzadeh -
2022 Spotlight: Deep Hierarchy in Bandits »
Joey Hong · Branislav Kveton · Sumeet Katariya · Manzil Zaheer · Mohammad Ghavamzadeh -
2021 Poster: Meta-Thompson Sampling »
Branislav Kveton · Mikhail Konobeev · Manzil Zaheer · Chih-wei Hsu · Martin Mladenov · Craig Boutilier · Csaba Szepesvari -
2021 Spotlight: Meta-Thompson Sampling »
Branislav Kveton · Mikhail Konobeev · Manzil Zaheer · Chih-wei Hsu · Martin Mladenov · Craig Boutilier · Csaba Szepesvari -
2020 Poster: Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems »
Tong Yu · Branislav Kveton · Zheng Wen · Ruiyi Zhang · Ole J. Mengshoel -
2019 Poster: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2019 Oral: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2017 Poster: Model-Independent Online Learning for Influence Maximization »
Sharan Vaswani · Branislav Kveton · Zheng Wen · Mohammad Ghavamzadeh · Laks V.S Lakshmanan · Mark Schmidt -
2017 Poster: Online Learning to Rank in Stochastic Click Models »
Masrour Zoghi · Tomas Tunys · Mohammad Ghavamzadeh · Branislav Kveton · Csaba Szepesvari · Zheng Wen -
2017 Talk: Online Learning to Rank in Stochastic Click Models »
Masrour Zoghi · Tomas Tunys · Mohammad Ghavamzadeh · Branislav Kveton · Csaba Szepesvari · Zheng Wen -
2017 Talk: Model-Independent Online Learning for Influence Maximization »
Sharan Vaswani · Branislav Kveton · Zheng Wen · Mohammad Ghavamzadeh · Laks V.S Lakshmanan · Mark Schmidt