Timezone: »
In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems. We specifically focus on bandit meta-learning, aiming to learn a policy that performs well across bandit tasks of a same class. To this end, we train a diffusion model that learns the underlying task distribution and combine Thompson sampling with the learned prior to deal with new tasks at test time. Our posterior sampling algorithm carefully balances between the learned prior and the noisy observations that come from the learner's interaction with the environment. To capture realistic bandit scenarios, we propose a novel diffusion model training procedure that trains from incomplete and noisy data, which could be of independent interest. Finally, our extensive experiments clearly demonstrate the potential of the proposed approach.
Author Information
Yu-Guan Hsieh (University of Grenoble-Alpes)
Shiva Kasiviswanathan (Amazon)
Branislav Kveton (AWS AI Labs)
Patrick Bloebaum (Amazon Web Services)
More from the Same Authors
-
2023 : Interventional and Counterfactual Inference with Diffusion Models »
Patrick Chao · Patrick Bloebaum · Shiva Kasiviswanathan -
2023 : Active Learning with Crowd Sourcing Improves Information Retrieval »
Zhuotong Chen · Yifei Ma · Branislav Kveton · Anoop Deoras -
2023 : Interventional and Counterfactual Inference with Diffusion Models »
Patrick Chao · Patrick Bloebaum · Shiva Kasiviswanathan -
2023 Workshop: The Many Facets of Preference-Based Learning »
Aadirupa Saha · Mohammad Ghavamzadeh · Robert Busa-Fekete · Branislav Kveton · Viktor Bengs -
2023 Poster: Multiplier Bootstrap-based Exploration »
Runzhe Wan · Haoyu Wei · Branislav Kveton · Rui Song -
2023 Poster: Multi-Task Off-Policy Learning from Bandit Feedback »
Joey Hong · Branislav Kveton · Manzil Zaheer · Sumeet Katariya · Mohammad Ghavamzadeh -
2023 Poster: Sequential Kernelized Independence Testing »
Aleksandr Podkopaev · Patrick Bloebaum · Shiva Kasiviswanathan · Aaditya Ramdas -
2023 Poster: Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism »
Yu-Guan Hsieh · Franck Iutzeler · Jérôme Malick · Panayotis Mertikopoulos -
2022 Poster: Safe Exploration for Efficient Policy Evaluation and Comparison »
Runzhe Wan · Branislav Kveton · Rui Song -
2022 Poster: Deep Hierarchy in Bandits »
Joey Hong · Branislav Kveton · Sumeet Katariya · Manzil Zaheer · Mohammad Ghavamzadeh -
2022 Spotlight: Deep Hierarchy in Bandits »
Joey Hong · Branislav Kveton · Sumeet Katariya · Manzil Zaheer · Mohammad Ghavamzadeh -
2022 Spotlight: Safe Exploration for Efficient Policy Evaluation and Comparison »
Runzhe Wan · Branislav Kveton · Rui Song -
2022 Poster: On Measuring Causal Contributions via do-interventions »
Yonghan Jung · Shiva Kasiviswanathan · Jin Tian · Dominik Janzing · Patrick Bloebaum · Elias Bareinboim -
2022 Poster: Causal structure-based root cause analysis of outliers »
Kailash Budhathoki · Lenon Minorics · Patrick Bloebaum · Dominik Janzing -
2022 Spotlight: Causal structure-based root cause analysis of outliers »
Kailash Budhathoki · Lenon Minorics · Patrick Bloebaum · Dominik Janzing -
2022 Spotlight: On Measuring Causal Contributions via do-interventions »
Yonghan Jung · Shiva Kasiviswanathan · Jin Tian · Dominik Janzing · Patrick Bloebaum · Elias Bareinboim -
2021 Poster: Meta-Thompson Sampling »
Branislav Kveton · Mikhail Konobeev · Manzil Zaheer · Chih-wei Hsu · Martin Mladenov · Craig Boutilier · Csaba Szepesvari -
2021 Spotlight: Meta-Thompson Sampling »
Branislav Kveton · Mikhail Konobeev · Manzil Zaheer · Chih-wei Hsu · Martin Mladenov · Craig Boutilier · Csaba Szepesvari -
2020 Poster: Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems »
Tong Yu · Branislav Kveton · Zheng Wen · Ruiyi Zhang · Ole J. Mengshoel -
2019 Poster: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2019 Oral: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2017 Poster: Model-Independent Online Learning for Influence Maximization »
Sharan Vaswani · Branislav Kveton · Zheng Wen · Mohammad Ghavamzadeh · Laks V.S Lakshmanan · Mark Schmidt -
2017 Poster: Online Learning to Rank in Stochastic Click Models »
Masrour Zoghi · Tomas Tunys · Mohammad Ghavamzadeh · Branislav Kveton · Csaba Szepesvari · Zheng Wen -
2017 Talk: Online Learning to Rank in Stochastic Click Models »
Masrour Zoghi · Tomas Tunys · Mohammad Ghavamzadeh · Branislav Kveton · Csaba Szepesvari · Zheng Wen -
2017 Talk: Model-Independent Online Learning for Influence Maximization »
Sharan Vaswani · Branislav Kveton · Zheng Wen · Mohammad Ghavamzadeh · Laks V.S Lakshmanan · Mark Schmidt