Timezone: »
Poster
Meta-Thompson Sampling
Branislav Kveton · Mikhail Konobeev · Manzil Zaheer · Chih-wei Hsu · Martin Mladenov · Craig Boutilier · Csaba Szepesvari
Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns the prior and thus we call it MetaTS. We propose several efficient implementations of MetaTS and analyze it in Gaussian bandits. Our analysis shows the benefit of meta-learning and is of a broader interest, because we derive a novel prior-dependent Bayes regret bound for Thompson sampling. Our theory is complemented by empirical evaluation, which shows that MetaTS quickly adapts to the unknown prior.
Author Information
Branislav Kveton (Google Research)
Mikhail Konobeev (University of Alberta)
Manzil Zaheer (Google Research)
Chih-wei Hsu (Google Research)
Martin Mladenov (Google)
Craig Boutilier (Google)
Csaba Szepesvari (DeepMind/University of Alberta)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Meta-Thompson Sampling »
Fri. Jul 23rd 01:30 -- 01:35 AM Room
More from the Same Authors
-
2023 : Preference Elicitation for Music Recommendations »
Ofer Meshi · Jon Feldman · Li Yang · Ben Scheetz · Yanli Cai · Mohammad Hossein Bateni · Corbyn Salisbury · Vikram Aggarwal · Craig Boutilier -
2023 Poster: Stochastic Gradient Succeeds for Bandits »
Jincheng Mei · Zixin Zhong · Bo Dai · Alekh Agarwal · Csaba Szepesvari · Dale Schuurmans -
2023 Poster: Revisiting Simple Regret: Fast Rates for Returning a Good Arm »
Yao Zhao · Connor J Stephens · Csaba Szepesvari · Kwang-Sung Jun -
2023 Poster: Reinforcement Learning with History Dependent Dynamic Contexts »
Guy Tennenholtz · Nadav Merlis · Lior Shani · Martin Mladenov · Craig Boutilier -
2023 Poster: The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation »
Philip Amortila · Nan Jiang · Csaba Szepesvari -
2023 Poster: Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice »
Toshinori Kitamura · Tadashi Kozuno · Yunhao Tang · Nino Vieillard · Michal Valko · Wenhao Yang · Jincheng Mei · Pierre Menard · Mohammad Gheshlaghi Azar · Remi Munos · Olivier Pietquin · Matthieu Geist · Csaba Szepesvari · Wataru Kumagai · Yutaka Matsuo -
2022 : Modeling Recommender Ecosystems - Some Considerations »
Craig Boutilier -
2022 Poster: Deep Hierarchy in Bandits »
Joey Hong · Branislav Kveton · Sumeet Katariya · Manzil Zaheer · Mohammad Ghavamzadeh -
2022 Poster: A Context-Integrated Transformer-Based Neural Network for Auction Design »
Zhijian Duan · Jingwu Tang · Yutong Yin · Zhe Feng · Xiang Yan · Manzil Zaheer · Xiaotie Deng -
2022 Spotlight: Deep Hierarchy in Bandits »
Joey Hong · Branislav Kveton · Sumeet Katariya · Manzil Zaheer · Mohammad Ghavamzadeh -
2022 Spotlight: A Context-Integrated Transformer-Based Neural Network for Auction Design »
Zhijian Duan · Jingwu Tang · Yutong Yin · Zhe Feng · Xiang Yan · Manzil Zaheer · Xiaotie Deng -
2022 Poster: Private Adaptive Optimization with Side information »
Tian Li · Manzil Zaheer · Sashank Jakkam Reddi · Virginia Smith -
2022 Poster: Robust Training of Neural Networks Using Scale Invariant Architectures »
Zhiyuan Li · Srinadh Bhojanapalli · Manzil Zaheer · Sashank Jakkam Reddi · Sanjiv Kumar -
2022 Spotlight: Private Adaptive Optimization with Side information »
Tian Li · Manzil Zaheer · Sashank Jakkam Reddi · Virginia Smith -
2022 Oral: Robust Training of Neural Networks Using Scale Invariant Architectures »
Zhiyuan Li · Srinadh Bhojanapalli · Manzil Zaheer · Sashank Jakkam Reddi · Sanjiv Kumar -
2022 Poster: StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models »
Adam Liska · Tomas Kocisky · Elena Gribovskaya · Tayfun Terzi · Eren Sezener · Devang Agrawal · Cyprien de Masson d'Autume · Tim Scholtes · Manzil Zaheer · Susannah Young · Ellen Gilsenan-McMahon · Sophia Austin · Phil Blunsom · Angeliki Lazaridou -
2022 Poster: Knowledge Base Question Answering by Case-based Reasoning over Subgraphs »
Rajarshi Das · Ameya Godbole · Ankita Rajaram Naik · Elliot Tower · Manzil Zaheer · Hannaneh Hajishirzi · Robin Jia · Andrew McCallum -
2022 Spotlight: Knowledge Base Question Answering by Case-based Reasoning over Subgraphs »
Rajarshi Das · Ameya Godbole · Ankita Rajaram Naik · Elliot Tower · Manzil Zaheer · Hannaneh Hajishirzi · Robin Jia · Andrew McCallum -
2022 Spotlight: StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models »
Adam Liska · Tomas Kocisky · Elena Gribovskaya · Tayfun Terzi · Eren Sezener · Devang Agrawal · Cyprien de Masson d'Autume · Tim Scholtes · Manzil Zaheer · Susannah Young · Ellen Gilsenan-McMahon · Sophia Austin · Phil Blunsom · Angeliki Lazaridou -
2021 Workshop: Workshop on Reinforcement Learning Theory »
Shipra Agrawal · Simon Du · Niao He · Csaba Szepesvari · Lin Yang -
2021 : RL Foundation Panel »
Matthew Botvinick · Thomas Dietterich · Leslie Kaelbling · John Langford · Warrren B Powell · Csaba Szepesvari · Lihong Li · Yuxi Li -
2021 Workshop: Reinforcement Learning for Real Life »
Yuxi Li · Minmin Chen · Omer Gottesman · Lihong Li · Zongqing Lu · Rupam Mahmood · Niranjani Prasad · Zhiwei (Tony) Qin · Csaba Szepesvari · Matthew Taylor -
2021 Poster: Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient »
Botao Hao · Yaqi Duan · Tor Lattimore · Csaba Szepesvari · Mengdi Wang -
2021 Poster: Improved Regret Bound and Experience Replay in Regularized Policy Iteration »
Nevena Lazic · Dong Yin · Yasin Abbasi-Yadkori · Csaba Szepesvari -
2021 Poster: Leveraging Non-uniformity in First-order Non-convex Optimization »
Jincheng Mei · Yue Gao · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2021 Poster: Latent Programmer: Discrete Latent Codes for Program Synthesis »
Joey Hong · David Dohan · Rishabh Singh · Charles Sutton · Manzil Zaheer -
2021 Poster: A Distribution-dependent Analysis of Meta Learning »
Mikhail Konobeev · Ilja Kuzborskij · Csaba Szepesvari -
2021 Oral: Latent Programmer: Discrete Latent Codes for Program Synthesis »
Joey Hong · David Dohan · Rishabh Singh · Charles Sutton · Manzil Zaheer -
2021 Oral: Improved Regret Bound and Experience Replay in Regularized Policy Iteration »
Nevena Lazic · Dong Yin · Yasin Abbasi-Yadkori · Csaba Szepesvari -
2021 Spotlight: Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient »
Botao Hao · Yaqi Duan · Tor Lattimore · Csaba Szepesvari · Mengdi Wang -
2021 Spotlight: Leveraging Non-uniformity in First-order Non-convex Optimization »
Jincheng Mei · Yue Gao · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2021 Spotlight: A Distribution-dependent Analysis of Meta Learning »
Mikhail Konobeev · Ilja Kuzborskij · Csaba Szepesvari -
2021 Poster: Bootstrapping Fitted Q-Evaluation for Off-Policy Inference »
Botao Hao · Xiang Ji · Yaqi Duan · Hao Lu · Csaba Szepesvari · Mengdi Wang -
2021 Spotlight: Bootstrapping Fitted Q-Evaluation for Off-Policy Inference »
Botao Hao · Xiang Ji · Yaqi Duan · Hao Lu · Csaba Szepesvari · Mengdi Wang -
2021 Poster: Federated Composite Optimization »
Honglin Yuan · Manzil Zaheer · Sashank Jakkam Reddi -
2021 Spotlight: Federated Composite Optimization »
Honglin Yuan · Manzil Zaheer · Sashank Jakkam Reddi -
2021 Town Hall: Town Hall »
John Langford · Marina Meila · Tong Zhang · Le Song · Stefanie Jegelka · Csaba Szepesvari -
2021 Poster: On the Optimality of Batch Policy Optimization Algorithms »
Chenjun Xiao · Yifan Wu · Jincheng Mei · Bo Dai · Tor Lattimore · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2021 Spotlight: On the Optimality of Batch Policy Optimization Algorithms »
Chenjun Xiao · Yifan Wu · Jincheng Mei · Bo Dai · Tor Lattimore · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 : Efficient Planning in Large MDPs with Weak Linear Function Approximation - Csaba Szepesvari »
Csaba Szepesvari -
2020 : Speaker Panel »
Csaba Szepesvari · Martha White · Sham Kakade · Gergely Neu · Shipra Agrawal · Akshay Krishnamurthy -
2020 Poster: On the Global Convergence Rates of Softmax Policy Gradient Methods »
Jincheng Mei · Chenjun Xiao · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: ConQUR: Mitigating Delusional Bias in Deep Q-Learning »
DiJia Su · Jayden Ooi · Tyler Lu · Dale Schuurmans · Craig Boutilier -
2020 Poster: Model-Based Reinforcement Learning with Value-Targeted Regression »
Alex Ayoub · Zeyu Jia · Csaba Szepesvari · Mengdi Wang · Lin Yang -
2020 Poster: Learning with Good Feature Representations in Bandits and in RL with a Generative Model »
Tor Lattimore · Csaba Szepesvari · Gellért Weisz -
2020 Poster: Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems »
Tong Yu · Branislav Kveton · Zheng Wen · Ruiyi Zhang · Ole J. Mengshoel -
2020 Poster: A simpler approach to accelerated optimization: iterative averaging meets optimism »
Pooria Joulani · Anant Raj · Andras Gyorgy · Csaba Szepesvari -
2020 Poster: Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach »
Martin Mladenov · Elliot Creager · Omer Ben-Porat · Kevin Swersky · Richard Zemel · Craig Boutilier -
2019 : panel discussion with Craig Boutilier (Google Research), Emma Brunskill (Stanford), Chelsea Finn (Google Brain, Stanford, UC Berkeley), Mohammad Ghavamzadeh (Facebook AI), John Langford (Microsoft Research) and David Silver (Deepmind) »
Peter Stone · Craig Boutilier · Emma Brunskill · Chelsea Finn · John Langford · David Silver · Mohammad Ghavamzadeh -
2019 : posters »
Zhengxing Chen · Juan Jose Garau Luis · Ignacio Albert Smet · Aditya Modi · Sabina Tomkins · Riley Simmons-Edler · Hongzi Mao · Alexander Irpan · Hao Lu · Rose Wang · Subhojyoti Mukherjee · Aniruddh Raghu · Syed Arbab Mohd Shihab · Byung Hoon Ahn · Rasool Fakoor · Pratik Chaudhari · Elena Smirnova · Min-hwan Oh · Xiaocheng Tang · Tony Qin · Qingyang Li · Marc Brittain · Ian Fox · Supratik Paul · Xiaofeng Gao · Yinlam Chow · Gabriel Dulac-Arnold · Ofir Nachum · Nikos Karampatziakis · Bharathan Balaji · Supratik Paul · Ali Davody · Djallel Bouneffouf · Himanshu Sahni · Soo Kim · Andrey Kolobov · Alexander Amini · Yao Liu · Xinshi Chen · · Craig Boutilier -
2019 : invited talk by Craig Boutilier (Google Research): Reinforcement Learning in Recommender Systems: Some Challenges »
Craig Boutilier -
2019 Workshop: Reinforcement Learning for Real Life »
Yuxi Li · Alborz Geramifard · Lihong Li · Csaba Szepesvari · Tao Wang -
2019 Poster: POLITEX: Regret Bounds for Policy Iteration using Expert Prediction »
Yasin Abbasi-Yadkori · Peter Bartlett · Kush Bhatia · Nevena Lazic · Csaba Szepesvari · Gellért Weisz -
2019 Oral: POLITEX: Regret Bounds for Policy Iteration using Expert Prediction »
Yasin Abbasi-Yadkori · Peter Bartlett · Kush Bhatia · Nevena Lazic · Csaba Szepesvari · Gellért Weisz -
2019 Poster: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2019 Poster: Online Learning to Rank with Features »
Shuai Li · Tor Lattimore · Csaba Szepesvari -
2019 Oral: Online Learning to Rank with Features »
Shuai Li · Tor Lattimore · Csaba Szepesvari -
2019 Oral: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2019 Poster: CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · Andras Gyorgy · Csaba Szepesvari -
2019 Oral: CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · Andras Gyorgy · Csaba Szepesvari -
2018 Poster: Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers »
Yao Ma · Alex Olshevsky · Csaba Szepesvari · Venkatesh Saligrama -
2018 Oral: Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers »
Yao Ma · Alex Olshevsky · Csaba Szepesvari · Venkatesh Saligrama -
2018 Poster: Bandits with Delayed, Aggregated Anonymous Feedback »
Ciara Pike-Burke · Shipra Agrawal · Csaba Szepesvari · Steffen Grünewälder -
2018 Oral: Bandits with Delayed, Aggregated Anonymous Feedback »
Ciara Pike-Burke · Shipra Agrawal · Csaba Szepesvari · Steffen Grünewälder -
2018 Poster: LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · Andras Gyorgy · Csaba Szepesvari -
2018 Oral: LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · Andras Gyorgy · Csaba Szepesvari -
2017 Poster: Model-Independent Online Learning for Influence Maximization »
Sharan Vaswani · Branislav Kveton · Zheng Wen · Mohammad Ghavamzadeh · Laks V.S Lakshmanan · Mark Schmidt -
2017 Poster: Online Learning to Rank in Stochastic Click Models »
Masrour Zoghi · Tomas Tunys · Mohammad Ghavamzadeh · Branislav Kveton · Csaba Szepesvari · Zheng Wen -
2017 Talk: Online Learning to Rank in Stochastic Click Models »
Masrour Zoghi · Tomas Tunys · Mohammad Ghavamzadeh · Branislav Kveton · Csaba Szepesvari · Zheng Wen -
2017 Talk: Model-Independent Online Learning for Influence Maximization »
Sharan Vaswani · Branislav Kveton · Zheng Wen · Mohammad Ghavamzadeh · Laks V.S Lakshmanan · Mark Schmidt