Timezone: »
Batch policy optimization considers leveraging existing data for policy construction before interacting with an environment. Although interest in this problem has grown significantly in recent years, its theoretical foundations remain under-developed. To advance the understanding of this problem, we provide three results that characterize the limits and possibilities of batch policy optimization in the finite-armed stochastic bandit setting. First, we introduce a class of confidence-adjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis. For this family, we show that any confidence-adjusted index algorithm is minimax optimal, whether it be optimistic, pessimistic or neutral. Our analysis reveals that instance-dependent optimality, commonly used to establish optimality of on-line stochastic bandit algorithms, cannot be achieved by any algorithm in the batch setting. In particular, for any algorithm that performs optimally in some environment, there exists another environment where the same algorithm suffers arbitrarily larger regret. Therefore, to establish a framework for distinguishing algorithms, we introduce a new weighted-minimax criterion that considers the inherent difficulty of optimal value prediction. We demonstrate how this criterion can be used to justify commonly used pessimistic principles for batch policy optimization.
Author Information
Chenjun Xiao (Google / University of Alberta)
Yifan Wu (Carnegie Mellon University)
Jincheng Mei (University of Alberta / Google Brain)
Bo Dai (Google Brain)
Tor Lattimore (DeepMind)
Lihong Li (Amazon)
Csaba Szepesvari (DeepMind/University of Alberta)
Dale Schuurmans (Google / University of Alberta)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: On the Optimality of Batch Policy Optimization Algorithms »
Tue. Jul 20th 04:00 -- 06:00 PM Room
More from the Same Authors
-
2022 : SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition »
Dylan Slack · Yinlam Chow · Bo Dai · Nevan Wichers -
2023 Poster: Stochastic Gradient Succeeds for Bandits »
Jincheng Mei · Zixin Zhong · Bo Dai · Alekh Agarwal · Csaba Szepesvari · Dale Schuurmans -
2023 Poster: Revisiting Simple Regret: Fast Rates for Returning a Good Arm »
Yao Zhao · Connor J Stephens · Csaba Szepesvari · Kwang-Sung Jun -
2023 Poster: The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation »
Philip Amortila · Nan Jiang · Csaba Szepesvari -
2023 Poster: Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice »
Toshinori Kitamura · Tadashi Kozuno · Yunhao Tang · Nino Vieillard · Michal Valko · Wenhao Yang · Jincheng Mei · Pierre Menard · Mohammad Gheshlaghi Azar · Remi Munos · Olivier Pietquin · Matthieu Geist · Csaba Szepesvari · Wataru Kumagai · Yutaka Matsuo -
2022 Poster: Contextual Information-Directed Sampling »
Botao Hao · Tor Lattimore · Chao Qin -
2022 Poster: Model Selection in Batch Policy Optimization »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai -
2022 Poster: Making Linear MDPs Practical via Contrastive Representation Learning »
Tianjun Zhang · Tongzheng Ren · Mengjiao Yang · Joseph E Gonzalez · Dale Schuurmans · Bo Dai -
2022 Poster: A Parametric Class of Approximate Gradient Updates for Policy Optimization »
Ramki Gummadi · Saurabh Kumar · Junfeng Wen · Dale Schuurmans -
2022 Spotlight: A Parametric Class of Approximate Gradient Updates for Policy Optimization »
Ramki Gummadi · Saurabh Kumar · Junfeng Wen · Dale Schuurmans -
2022 Spotlight: Making Linear MDPs Practical via Contrastive Representation Learning »
Tianjun Zhang · Tongzheng Ren · Mengjiao Yang · Joseph E Gonzalez · Dale Schuurmans · Bo Dai -
2022 Spotlight: Contextual Information-Directed Sampling »
Botao Hao · Tor Lattimore · Chao Qin -
2022 Spotlight: Model Selection in Batch Policy Optimization »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai -
2022 Poster: Marginal Distribution Adaptation for Discrete Sets via Module-Oriented Divergence Minimization »
Hanjun Dai · Mengjiao Yang · Yuan Xue · Dale Schuurmans · Bo Dai -
2022 Spotlight: Marginal Distribution Adaptation for Discrete Sets via Module-Oriented Divergence Minimization »
Hanjun Dai · Mengjiao Yang · Yuan Xue · Dale Schuurmans · Bo Dai -
2021 : Invited Speaker: Bo Dai: Leveraging Non-uniformity in Policy Gradient »
Bo Dai -
2021 Workshop: Workshop on Reinforcement Learning Theory »
Shipra Agrawal · Simon Du · Niao He · Csaba Szepesvari · Lin Yang -
2021 : RL + Recommender Systems Panel »
Alekh Agarwal · Ed Chi · Maria Dimakopoulou · Georgios Theocharous · Minmin Chen · Lihong Li -
2021 : RL Foundation Panel »
Matthew Botvinick · Thomas Dietterich · Leslie Kaelbling · John Langford · Warrren B Powell · Csaba Szepesvari · Lihong Li · Yuxi Li -
2021 Workshop: Reinforcement Learning for Real Life »
Yuxi Li · Minmin Chen · Omer Gottesman · Lihong Li · Zongqing Lu · Rupam Mahmood · Niranjani Prasad · Zhiwei (Tony) Qin · Csaba Szepesvari · Matthew Taylor -
2021 Poster: Overcoming Catastrophic Forgetting by Bayesian Generative Regularization »
PEI-HUNG Chen · Wei Wei · Cho-Jui Hsieh · Bo Dai -
2021 Poster: Meta-Thompson Sampling »
Branislav Kveton · Mikhail Konobeev · Manzil Zaheer · Chih-wei Hsu · Martin Mladenov · Craig Boutilier · Csaba Szepesvari -
2021 Spotlight: Overcoming Catastrophic Forgetting by Bayesian Generative Regularization »
PEI-HUNG Chen · Wei Wei · Cho-Jui Hsieh · Bo Dai -
2021 Spotlight: Meta-Thompson Sampling »
Branislav Kveton · Mikhail Konobeev · Manzil Zaheer · Chih-wei Hsu · Martin Mladenov · Craig Boutilier · Csaba Szepesvari -
2021 Poster: Near-Optimal Representation Learning for Linear Bandits and Linear RL »
Jiachen Hu · Xiaoyu Chen · Chi Jin · Lihong Li · Liwei Wang -
2021 Poster: Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient »
Botao Hao · Yaqi Duan · Tor Lattimore · Csaba Szepesvari · Mengdi Wang -
2021 Poster: LEGO: Latent Execution-Guided Reasoning for Multi-Hop Question Answering on Knowledge Graphs »
Hongyu Ren · Hanjun Dai · Bo Dai · Xinyun Chen · Michihiro Yasunaga · Haitian Sun · Dale Schuurmans · Jure Leskovec · Denny Zhou -
2021 Poster: Improved Regret Bound and Experience Replay in Regularized Policy Iteration »
Nevena Lazic · Dong Yin · Yasin Abbasi-Yadkori · Csaba Szepesvari -
2021 Poster: Leveraging Non-uniformity in First-order Non-convex Optimization »
Jincheng Mei · Yue Gao · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2021 Poster: A Distribution-dependent Analysis of Meta Learning »
Mikhail Konobeev · Ilja Kuzborskij · Csaba Szepesvari -
2021 Spotlight: LEGO: Latent Execution-Guided Reasoning for Multi-Hop Question Answering on Knowledge Graphs »
Hongyu Ren · Hanjun Dai · Bo Dai · Xinyun Chen · Michihiro Yasunaga · Haitian Sun · Dale Schuurmans · Jure Leskovec · Denny Zhou -
2021 Oral: Improved Regret Bound and Experience Replay in Regularized Policy Iteration »
Nevena Lazic · Dong Yin · Yasin Abbasi-Yadkori · Csaba Szepesvari -
2021 Spotlight: Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient »
Botao Hao · Yaqi Duan · Tor Lattimore · Csaba Szepesvari · Mengdi Wang -
2021 Spotlight: Leveraging Non-uniformity in First-order Non-convex Optimization »
Jincheng Mei · Yue Gao · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2021 Spotlight: Near-Optimal Representation Learning for Linear Bandits and Linear RL »
Jiachen Hu · Xiaoyu Chen · Chi Jin · Lihong Li · Liwei Wang -
2021 Spotlight: A Distribution-dependent Analysis of Meta Learning »
Mikhail Konobeev · Ilja Kuzborskij · Csaba Szepesvari -
2021 Poster: Bootstrapping Fitted Q-Evaluation for Off-Policy Inference »
Botao Hao · Xiang Ji · Yaqi Duan · Hao Lu · Csaba Szepesvari · Mengdi Wang -
2021 Poster: Instabilities of Offline RL with Pre-Trained Neural Representation »
Ruosong Wang · Yifan Wu · Ruslan Salakhutdinov · Sham Kakade -
2021 Spotlight: Instabilities of Offline RL with Pre-Trained Neural Representation »
Ruosong Wang · Yifan Wu · Ruslan Salakhutdinov · Sham Kakade -
2021 Spotlight: Bootstrapping Fitted Q-Evaluation for Off-Policy Inference »
Botao Hao · Xiang Ji · Yaqi Duan · Hao Lu · Csaba Szepesvari · Mengdi Wang -
2021 Town Hall: Town Hall »
John Langford · Marina Meila · Tong Zhang · Le Song · Stefanie Jegelka · Csaba Szepesvari -
2021 Poster: EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL »
Seyed Kamyar Seyed Ghasemipour · Dale Schuurmans · Shixiang Gu -
2021 Spotlight: EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL »
Seyed Kamyar Seyed Ghasemipour · Dale Schuurmans · Shixiang Gu -
2020 : Efficient Planning in Large MDPs with Weak Linear Function Approximation - Csaba Szepesvari »
Csaba Szepesvari -
2020 : Speaker Panel »
Csaba Szepesvari · Martha White · Sham Kakade · Gergely Neu · Shipra Agrawal · Akshay Krishnamurthy -
2020 Poster: Linear bandits with Stochastic Delayed Feedback »
Claire Vernade · Alexandra Carpentier · Tor Lattimore · Giovanni Zappella · Beyza Ermis · Michael Brueckner -
2020 Poster: Energy-Based Processes for Exchangeable Data »
Mengjiao Yang · Bo Dai · Hanjun Dai · Dale Schuurmans -
2020 Poster: On the Global Convergence Rates of Softmax Policy Gradient Methods »
Jincheng Mei · Chenjun Xiao · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: ConQUR: Mitigating Delusional Bias in Deep Q-Learning »
DiJia Su · Jayden Ooi · Tyler Lu · Dale Schuurmans · Craig Boutilier -
2020 Poster: Model-Based Reinforcement Learning with Value-Targeted Regression »
Alex Ayoub · Zeyu Jia · Csaba Szepesvari · Mengdi Wang · Lin Yang -
2020 Poster: Go Wide, Then Narrow: Efficient Training of Deep Thin Networks »
Denny Zhou · Mao Ye · Chen Chen · Tianjian Meng · Mingxing Tan · Xiaodan Song · Quoc Le · Qiang Liu · Dale Schuurmans -
2020 Poster: Batch Stationary Distribution Estimation »
Junfeng Wen · Bo Dai · Lihong Li · Dale Schuurmans -
2020 Poster: An Optimistic Perspective on Offline Deep Reinforcement Learning »
Rishabh Agarwal · Dale Schuurmans · Mohammad Norouzi -
2020 Poster: Learning with Good Feature Representations in Bandits and in RL with a Generative Model »
Tor Lattimore · Csaba Szepesvari · Gellért Weisz -
2020 Poster: A simpler approach to accelerated optimization: iterative averaging meets optimism »
Pooria Joulani · Anant Raj · Andras Gyorgy · Csaba Szepesvari -
2020 Poster: Neural Contextual Bandits with UCB-based Exploration »
Dongruo Zhou · Lihong Li · Quanquan Gu -
2020 Poster: Scalable Deep Generative Modeling for Sparse Graphs »
Hanjun Dai · Azade Nova · Yujia Li · Bo Dai · Dale Schuurmans -
2019 Workshop: Reinforcement Learning for Real Life »
Yuxi Li · Alborz Geramifard · Lihong Li · Csaba Szepesvari · Tao Wang -
2019 Poster: Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment »
Yifan Wu · Ezra Winston · Divyansh Kaushik · Zachary Lipton -
2019 Poster: POLITEX: Regret Bounds for Policy Iteration using Expert Prediction »
Yasin Abbasi-Yadkori · Peter Bartlett · Kush Bhatia · Nevena Lazic · Csaba Szepesvari · Gellért Weisz -
2019 Poster: Learning to Generalize from Sparse and Underspecified Rewards »
Rishabh Agarwal · Chen Liang · Dale Schuurmans · Mohammad Norouzi -
2019 Oral: Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment »
Yifan Wu · Ezra Winston · Divyansh Kaushik · Zachary Lipton -
2019 Oral: Learning to Generalize from Sparse and Underspecified Rewards »
Rishabh Agarwal · Chen Liang · Dale Schuurmans · Mohammad Norouzi -
2019 Oral: POLITEX: Regret Bounds for Policy Iteration using Expert Prediction »
Yasin Abbasi-Yadkori · Peter Bartlett · Kush Bhatia · Nevena Lazic · Csaba Szepesvari · Gellért Weisz -
2019 Poster: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2019 Poster: Online Learning to Rank with Features »
Shuai Li · Tor Lattimore · Csaba Szepesvari -
2019 Poster: Understanding the Impact of Entropy on Policy Optimization »
Zafarali Ahmed · Nicolas Le Roux · Mohammad Norouzi · Dale Schuurmans -
2019 Oral: Understanding the Impact of Entropy on Policy Optimization »
Zafarali Ahmed · Nicolas Le Roux · Mohammad Norouzi · Dale Schuurmans -
2019 Oral: Online Learning to Rank with Features »
Shuai Li · Tor Lattimore · Csaba Szepesvari -
2019 Oral: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2019 Poster: CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · Andras Gyorgy · Csaba Szepesvari -
2019 Poster: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Poster: The Value Function Polytope in Reinforcement Learning »
Robert Dadashi · Marc Bellemare · Adrien Ali Taiga · Nicolas Le Roux · Dale Schuurmans -
2019 Oral: The Value Function Polytope in Reinforcement Learning »
Robert Dadashi · Marc Bellemare · Adrien Ali Taiga · Nicolas Le Roux · Dale Schuurmans -
2019 Oral: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Oral: CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · Andras Gyorgy · Csaba Szepesvari -
2018 Poster: Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers »
Yao Ma · Alex Olshevsky · Csaba Szepesvari · Venkatesh Saligrama -
2018 Oral: Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers »
Yao Ma · Alex Olshevsky · Csaba Szepesvari · Venkatesh Saligrama -
2018 Poster: Bandits with Delayed, Aggregated Anonymous Feedback »
Ciara Pike-Burke · Shipra Agrawal · Csaba Szepesvari · Steffen Grünewälder -
2018 Poster: Scalable Bilinear Pi Learning Using State and Action Features »
Yichen Chen · Lihong Li · Mengdi Wang -
2018 Poster: Towards Black-box Iterative Machine Teaching »
Weiyang Liu · Bo Dai · Xingguo Li · Zhen Liu · James Rehg · Le Song -
2018 Poster: SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation »
Bo Dai · Albert Shaw · Lihong Li · Lin Xiao · Niao He · Zhen Liu · Jianshu Chen · Le Song -
2018 Oral: Towards Black-box Iterative Machine Teaching »
Weiyang Liu · Bo Dai · Xingguo Li · Zhen Liu · James Rehg · Le Song -
2018 Oral: Bandits with Delayed, Aggregated Anonymous Feedback »
Ciara Pike-Burke · Shipra Agrawal · Csaba Szepesvari · Steffen Grünewälder -
2018 Oral: Scalable Bilinear Pi Learning Using State and Action Features »
Yichen Chen · Lihong Li · Mengdi Wang -
2018 Oral: SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation »
Bo Dai · Albert Shaw · Lihong Li · Lin Xiao · Niao He · Zhen Liu · Jianshu Chen · Le Song -
2018 Poster: LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · Andras Gyorgy · Csaba Szepesvari -
2018 Poster: Learning Steady-States of Iterative Algorithms over Graphs »
Hanjun Dai · Zornitsa Kozareva · Bo Dai · Alex Smola · Le Song -
2018 Oral: Learning Steady-States of Iterative Algorithms over Graphs »
Hanjun Dai · Zornitsa Kozareva · Bo Dai · Alex Smola · Le Song -
2018 Oral: LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · Andras Gyorgy · Csaba Szepesvari -
2017 Poster: Stochastic Generative Hashing »
Bo Dai · Ruiqi Guo · Sanjiv Kumar · Niao He · Le Song -
2017 Talk: Stochastic Generative Hashing »
Bo Dai · Ruiqi Guo · Sanjiv Kumar · Niao He · Le Song -
2017 Poster: Stochastic Variance Reduction Methods for Policy Evaluation »
Simon Du · Jianshu Chen · Lihong Li · Lin Xiao · Dengyong Zhou -
2017 Talk: Stochastic Variance Reduction Methods for Policy Evaluation »
Simon Du · Jianshu Chen · Lihong Li · Lin Xiao · Dengyong Zhou -
2017 Poster: Online Learning to Rank in Stochastic Click Models »
Masrour Zoghi · Tomas Tunys · Mohammad Ghavamzadeh · Branislav Kveton · Csaba Szepesvari · Zheng Wen -
2017 Poster: Provably Optimal Algorithms for Generalized Linear Contextual Bandits »
Lihong Li · Yu Lu · Dengyong Zhou -
2017 Poster: Iterative Machine Teaching »
Weiyang Liu · Bo Dai · Ahmad Humayun · Charlene Tay · Chen Yu · Linda Smith · James Rehg · Le Song -
2017 Talk: Iterative Machine Teaching »
Weiyang Liu · Bo Dai · Ahmad Humayun · Charlene Tay · Chen Yu · Linda Smith · James Rehg · Le Song -
2017 Talk: Provably Optimal Algorithms for Generalized Linear Contextual Bandits »
Lihong Li · Yu Lu · Dengyong Zhou -
2017 Talk: Online Learning to Rank in Stochastic Click Models »
Masrour Zoghi · Tomas Tunys · Mohammad Ghavamzadeh · Branislav Kveton · Csaba Szepesvari · Zheng Wen