Timezone: »
Many practical problems involve solving similar tasks. In recommender systems, the tasks can be users with similar preferences; in search engines, the tasks can be items with similar affinities. To learn statistically efficiently, the tasks can be organized in a hierarchy, where the task affinity is captured using an unknown latent parameter. We study the problem of off-policy learning for similar tasks from logged bandit feedback. To solve the problem, we propose a hierarchical off-policy optimization algorithm HierOPO. The key idea is to estimate the task parameters using the hierarchy and then act pessimistically with respect to them. To analyze the algorithm, we develop novel Bayesian error bounds. Our bounds are the first in off-policy learning that improve with a more informative prior and capture statistical gains due to hierarchical models. Therefore, they are of a general interest. HierOPO also performs well in practice. Our experiments demonstrate the benefits of using the hierarchy over solving each task independently.
Author Information
Joey Hong (Berkeley)
Branislav Kveton (AWS AI Labs)
Manzil Zaheer (Google DeepMind)
Sumeet Katariya (Amazon)
Mohammad Ghavamzadeh (Google Research)
More from the Same Authors
-
2022 : Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms »
MohammadJavad Azizi · Thang Duong · Yasin Abbasi-Yadkori · Claire Vernade · Andras Gyorgy · Mohammad Ghavamzadeh -
2023 : Algorithms for Optimal Adaptation of Diffusion Models to Reward Functions »
Krishnamurthy Dvijotham · Shayegan Omidshafiei · Kimin Lee · Katie Collins · Deepak Ramachandran · Adrian Weller · Mohammad Ghavamzadeh · Milad Nasresfahani · Ying Fan · Jeremiah Liu -
2023 : Can Public Large Language Models Help Private Cross-device Federated Learning? »
Boxin Wang · Yibo J. Zhang · Yuan Cao · Bo Li · Hugh B McMahan · Sewoong Oh · Zheng Xu · Manzil Zaheer -
2023 : Can Public Large Language Models Help Private Cross-device Federated Learning? »
Boxin Wang · Yibo J. Zhang · Yuan Cao · Bo Li · Hugh B McMahan · Sewoong Oh · Zheng Xu · Manzil Zaheer -
2023 : Active Learning with Crowd Sourcing Improves Information Retrieval »
Zhuotong Chen · Yifei Ma · Branislav Kveton · Anoop Deoras -
2023 Workshop: The Many Facets of Preference-Based Learning »
Aadirupa Saha · Mohammad Ghavamzadeh · Robert Busa-Fekete · Branislav Kveton · Viktor Bengs -
2023 Poster: Thompson Sampling with Diffusion Generative Prior »
Yu-Guan Hsieh · Shiva Kasiviswanathan · Branislav Kveton · Patrick Bloebaum -
2023 Poster: Multiplier Bootstrap-based Exploration »
Runzhe Wan · Haoyu Wei · Branislav Kveton · Rui Song -
2023 Poster: A Statistical Perspective on Retrieval-Based Models »
Soumya Basu · Ankit Singh Rawat · Manzil Zaheer -
2022 Poster: Safe Exploration for Efficient Policy Evaluation and Comparison »
Runzhe Wan · Branislav Kveton · Rui Song -
2022 Poster: Deep Hierarchy in Bandits »
Joey Hong · Branislav Kveton · Sumeet Katariya · Manzil Zaheer · Mohammad Ghavamzadeh -
2022 Spotlight: Deep Hierarchy in Bandits »
Joey Hong · Branislav Kveton · Sumeet Katariya · Manzil Zaheer · Mohammad Ghavamzadeh -
2022 Spotlight: Safe Exploration for Efficient Policy Evaluation and Comparison »
Runzhe Wan · Branislav Kveton · Rui Song -
2022 Poster: Feature and Parameter Selection in Stochastic Linear Bandits »
Ahmadreza Moradipari · Berkay Turan · Yasin Abbasi-Yadkori · Mahnoosh Alizadeh · Mohammad Ghavamzadeh -
2022 Spotlight: Feature and Parameter Selection in Stochastic Linear Bandits »
Ahmadreza Moradipari · Berkay Turan · Yasin Abbasi-Yadkori · Mahnoosh Alizadeh · Mohammad Ghavamzadeh -
2021 Poster: Meta-Thompson Sampling »
Branislav Kveton · Mikhail Konobeev · Manzil Zaheer · Chih-wei Hsu · Martin Mladenov · Craig Boutilier · Csaba Szepesvari -
2021 Spotlight: Meta-Thompson Sampling »
Branislav Kveton · Mikhail Konobeev · Manzil Zaheer · Chih-wei Hsu · Martin Mladenov · Craig Boutilier · Csaba Szepesvari -
2021 Poster: PID Accelerated Value Iteration Algorithm »
Amir-massoud Farahmand · Mohammad Ghavamzadeh -
2021 Spotlight: PID Accelerated Value Iteration Algorithm »
Amir-massoud Farahmand · Mohammad Ghavamzadeh -
2020 Poster: Robust Outlier Arm Identification »
Yinglun Zhu · Sumeet Katariya · Robert Nowak -
2020 Poster: Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems »
Tong Yu · Branislav Kveton · Zheng Wen · Ruiyi Zhang · Ole J. Mengshoel -
2019 Poster: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2019 Oral: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2017 Poster: Model-Independent Online Learning for Influence Maximization »
Sharan Vaswani · Branislav Kveton · Zheng Wen · Mohammad Ghavamzadeh · Laks V.S Lakshmanan · Mark Schmidt -
2017 Poster: Online Learning to Rank in Stochastic Click Models »
Masrour Zoghi · Tomas Tunys · Mohammad Ghavamzadeh · Branislav Kveton · Csaba Szepesvari · Zheng Wen -
2017 Talk: Online Learning to Rank in Stochastic Click Models »
Masrour Zoghi · Tomas Tunys · Mohammad Ghavamzadeh · Branislav Kveton · Csaba Szepesvari · Zheng Wen -
2017 Talk: Model-Independent Online Learning for Influence Maximization »
Sharan Vaswani · Branislav Kveton · Zheng Wen · Mohammad Ghavamzadeh · Laks V.S Lakshmanan · Mark Schmidt