Timezone: »
This paper studies model-based reinforcement learning (RL) for regret minimization. We focus on finite-horizon episodic RL where the transition model P belongs to a known family of models, a special case of which is when models in the model class take the form of linear mixtures. We propose a model based RL algorithm that is based on the optimism principle: In each episode, the set of models that are `consistent' with the data collected is constructed. The criterion of consistency is based on the total squared error that the model incurs on the task of predicting state values as determined by the last value estimate along the transitions. The next value function is then chosen by solving the optimistic planning problem with the constructed set of models. We derive a bound on the regret, which, in the special case of linear mixtures, takes the form O(d (H^3 T)^(1/2) ), where H, T and d are the horizon, the total number of steps and the dimension of the parameter vector, respectively. In particular, this regret bound is independent of the total number of states or actions, and is close to a lower bound Omega( (HdT)^(1/2) ). For a general model family, the regret bound is derived based on the Eluder dimension.
Author Information
Alex Ayoub (University of Alberta)
Zeyu Jia (Peking University)
Csaba Szepesvari (DeepMind/University of Alberta)
Mengdi Wang (Princeton University)
Lin Yang (UCLA)
More from the Same Authors
-
2021 : Gap-Dependent Unsupervised Exploration for Reinforcement Learning »
Jingfeng Wu · Vladimir Braverman · Lin Yang -
2021 : Online Sub-Sampling for Reinforcement Learning with General Function Approximation »
Dingwen Kong · Ruslan Salakhutdinov · Ruosong Wang · Lin Yang -
2021 : Solving Multi-Arm Bandit Using a Few Bits of Communication »
Osama Hanna · Lin Yang · Christina Fragouli -
2022 : Provably Correct SGD-based Exploration for Linear Bandit »
Jialin Dong · Lin Yang -
2022 : Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning »
Dingwen Kong · Lin Yang -
2022 : Policy Gradient: Theory for Making Best Use of It »
Mengdi Wang -
2022 Poster: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Poster: On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning »
Weichao Mao · Lin Yang · Kaiqing Zhang · Tamer Basar -
2022 Poster: Optimal Estimation of Policy Gradient via Double Fitted Iteration »
Chengzhuo Ni · Ruiqi Zhang · Xiang Ji · Xuezhou Zhang · Mengdi Wang -
2022 Poster: Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory »
Ruiqi Zhang · Xuezhou Zhang · Chengzhuo Ni · Mengdi Wang -
2022 Spotlight: On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning »
Weichao Mao · Lin Yang · Kaiqing Zhang · Tamer Basar -
2022 Spotlight: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Spotlight: Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory »
Ruiqi Zhang · Xuezhou Zhang · Chengzhuo Ni · Mengdi Wang -
2022 Spotlight: Optimal Estimation of Policy Gradient via Double Fitted Iteration »
Chengzhuo Ni · Ruiqi Zhang · Xiang Ji · Xuezhou Zhang · Mengdi Wang -
2022 Social: Designing an RL system toward AGI »
Yi Wan · Alex Ayoub -
2021 : Solving Multi-Arm Bandit Using a Few Bits of Communication »
Osama Hanna · Lin Yang · Christina Fragouli -
2021 Workshop: Workshop on Reinforcement Learning Theory »
Shipra Agrawal · Simon Du · Niao He · Csaba Szepesvari · Lin Yang -
2021 : RL Foundation Panel »
Matthew Botvinick · Thomas Dietterich · Leslie Kaelbling · John Langford · Warrren B Powell · Csaba Szepesvari · Lihong Li · Yuxi Li -
2021 Workshop: Reinforcement Learning for Real Life »
Yuxi Li · Minmin Chen · Omer Gottesman · Lihong Li · Zongqing Lu · Rupam Mahmood · Niranjani Prasad · Zhiwei (Tony) Qin · Csaba Szepesvari · Matthew Taylor -
2021 Poster: Meta-Thompson Sampling »
Branislav Kveton · Mikhail Konobeev · Manzil Zaheer · Chih-wei Hsu · Martin Mladenov · Craig Boutilier · Csaba Szepesvari -
2021 Spotlight: Meta-Thompson Sampling »
Branislav Kveton · Mikhail Konobeev · Manzil Zaheer · Chih-wei Hsu · Martin Mladenov · Craig Boutilier · Csaba Szepesvari -
2021 Social: RL Social »
Dibya Ghosh · Hager Radi · Derek Li · Alex Ayoub · Erfan Miahi · Rishabh Agarwal · Charline Le Lan · Abhishek Naik · John Martin · Shruti Mishra · Adrien Ali Taiga -
2021 Poster: Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient »
Botao Hao · Yaqi Duan · Tor Lattimore · Csaba Szepesvari · Mengdi Wang -
2021 Poster: Improved Regret Bound and Experience Replay in Regularized Policy Iteration »
Nevena Lazic · Dong Yin · Yasin Abbasi-Yadkori · Csaba Szepesvari -
2021 Poster: Provably Correct Optimization and Exploration with Non-linear Policies »
Fei Feng · Wotao Yin · Alekh Agarwal · Lin Yang -
2021 Poster: Leveraging Non-uniformity in First-order Non-convex Optimization »
Jincheng Mei · Yue Gao · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2021 Poster: Randomized Exploration in Reinforcement Learning with General Value Function Approximation »
Haque Ishfaq · Qiwen Cui · Viet Nguyen · Alex Ayoub · Zhuoran Yang · Zhaoran Wang · Doina Precup · Lin Yang -
2021 Poster: A Distribution-dependent Analysis of Meta Learning »
Mikhail Konobeev · Ilja Kuzborskij · Csaba Szepesvari -
2021 Spotlight: Provably Correct Optimization and Exploration with Non-linear Policies »
Fei Feng · Wotao Yin · Alekh Agarwal · Lin Yang -
2021 Oral: Improved Regret Bound and Experience Replay in Regularized Policy Iteration »
Nevena Lazic · Dong Yin · Yasin Abbasi-Yadkori · Csaba Szepesvari -
2021 Spotlight: Randomized Exploration in Reinforcement Learning with General Value Function Approximation »
Haque Ishfaq · Qiwen Cui · Viet Nguyen · Alex Ayoub · Zhuoran Yang · Zhaoran Wang · Doina Precup · Lin Yang -
2021 Spotlight: Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient »
Botao Hao · Yaqi Duan · Tor Lattimore · Csaba Szepesvari · Mengdi Wang -
2021 Spotlight: Leveraging Non-uniformity in First-order Non-convex Optimization »
Jincheng Mei · Yue Gao · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2021 Spotlight: A Distribution-dependent Analysis of Meta Learning »
Mikhail Konobeev · Ilja Kuzborskij · Csaba Szepesvari -
2021 Poster: Bootstrapping Fitted Q-Evaluation for Off-Policy Inference »
Botao Hao · Xiang Ji · Yaqi Duan · Hao Lu · Csaba Szepesvari · Mengdi Wang -
2021 Spotlight: Bootstrapping Fitted Q-Evaluation for Off-Policy Inference »
Botao Hao · Xiang Ji · Yaqi Duan · Hao Lu · Csaba Szepesvari · Mengdi Wang -
2021 Poster: Safe Reinforcement Learning with Linear Function Approximation »
Sanae Amani Geshnigani · Christos Thrampoulidis · Lin Yang -
2021 Spotlight: Safe Reinforcement Learning with Linear Function Approximation »
Sanae Amani Geshnigani · Christos Thrampoulidis · Lin Yang -
2021 Town Hall: Town Hall »
John Langford · Marina Meila · Tong Zhang · Le Song · Stefanie Jegelka · Csaba Szepesvari -
2021 Poster: On the Optimality of Batch Policy Optimization Algorithms »
Chenjun Xiao · Yifan Wu · Jincheng Mei · Bo Dai · Tor Lattimore · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2021 Spotlight: On the Optimality of Batch Policy Optimization Algorithms »
Chenjun Xiao · Yifan Wu · Jincheng Mei · Bo Dai · Tor Lattimore · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 : QA for invited talk 7 Wang »
Mengdi Wang -
2020 : Invited talk 7 Wang »
Mengdi Wang -
2020 : Efficient Planning in Large MDPs with Weak Linear Function Approximation - Csaba Szepesvari »
Csaba Szepesvari -
2020 : Speaker Panel »
Csaba Szepesvari · Martha White · Sham Kakade · Gergely Neu · Shipra Agrawal · Akshay Krishnamurthy -
2020 Workshop: Theoretical Foundations of Reinforcement Learning »
Emma Brunskill · Thodoris Lykouris · Max Simchowitz · Wen Sun · Mengdi Wang -
2020 Poster: On the Global Convergence Rates of Softmax Policy Gradient Methods »
Jincheng Mei · Chenjun Xiao · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound »
Lin Yang · Mengdi Wang -
2020 Poster: Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation »
Yaqi Duan · Zeyu Jia · Mengdi Wang -
2020 Poster: Learning with Good Feature Representations in Bandits and in RL with a Generative Model »
Tor Lattimore · Csaba Szepesvari · Gellért Weisz -
2020 Poster: A simpler approach to accelerated optimization: iterative averaging meets optimism »
Pooria Joulani · Anant Raj · András György · Csaba Szepesvari -
2020 Poster: Nearly Linear Row Sampling Algorithm for Quantile Regression »
Yi Li · Ruosong Wang · Lin Yang · Hanrui Zhang -
2020 Poster: Obtaining Adjustable Regularization for Free via Iterate Averaging »
Jingfeng Wu · Vladimir Braverman · Lin Yang -
2019 Workshop: Reinforcement Learning for Real Life »
Yuxi Li · Alborz Geramifard · Lihong Li · Csaba Szepesvari · Tao Wang -
2019 Poster: POLITEX: Regret Bounds for Policy Iteration using Expert Prediction »
Yasin Abbasi-Yadkori · Peter Bartlett · Kush Bhatia · Nevena Lazic · Csaba Szepesvari · Gellért Weisz -
2019 Oral: POLITEX: Regret Bounds for Policy Iteration using Expert Prediction »
Yasin Abbasi-Yadkori · Peter Bartlett · Kush Bhatia · Nevena Lazic · Csaba Szepesvari · Gellért Weisz -
2019 Poster: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2019 Poster: Online Learning to Rank with Features »
Shuai Li · Tor Lattimore · Csaba Szepesvari -
2019 Poster: Sample-Optimal Parametric Q-Learning Using Linearly Additive Features »
Lin Yang · Mengdi Wang -
2019 Oral: Sample-Optimal Parametric Q-Learning Using Linearly Additive Features »
Lin Yang · Mengdi Wang -
2019 Oral: Online Learning to Rank with Features »
Shuai Li · Tor Lattimore · Csaba Szepesvari -
2019 Oral: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2019 Poster: CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · András György · Csaba Szepesvari -
2019 Oral: CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · András György · Csaba Szepesvari -
2018 Poster: Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers »
Yao Ma · Alex Olshevsky · Csaba Szepesvari · Venkatesh Saligrama -
2018 Poster: Estimation of Markov Chain via Rank-constrained Likelihood »
XUDONG LI · Mengdi Wang · Anru Zhang -
2018 Oral: Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers »
Yao Ma · Alex Olshevsky · Csaba Szepesvari · Venkatesh Saligrama -
2018 Oral: Estimation of Markov Chain via Rank-constrained Likelihood »
XUDONG LI · Mengdi Wang · Anru Zhang -
2018 Poster: Bandits with Delayed, Aggregated Anonymous Feedback »
Ciara Pike-Burke · Shipra Agrawal · Csaba Szepesvari · Steffen Grünewälder -
2018 Poster: Scalable Bilinear Pi Learning Using State and Action Features »
Yichen Chen · Lihong Li · Mengdi Wang -
2018 Oral: Bandits with Delayed, Aggregated Anonymous Feedback »
Ciara Pike-Burke · Shipra Agrawal · Csaba Szepesvari · Steffen Grünewälder -
2018 Oral: Scalable Bilinear Pi Learning Using State and Action Features »
Yichen Chen · Lihong Li · Mengdi Wang -
2018 Poster: LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · András György · Csaba Szepesvari -
2018 Oral: LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · András György · Csaba Szepesvari -
2017 Poster: Online Learning to Rank in Stochastic Click Models »
Masrour Zoghi · Tomas Tunys · Mohammad Ghavamzadeh · Branislav Kveton · Csaba Szepesvari · Zheng Wen -
2017 Poster: Strong NP-Hardness for Sparse Optimization with Concave Penalty Functions »
Yichen Chen · Dongdong Ge · Mengdi Wang · Zizhuo Wang · Yinyu Ye · Hao Yin -
2017 Talk: Strong NP-Hardness for Sparse Optimization with Concave Penalty Functions »
Yichen Chen · Dongdong Ge · Mengdi Wang · Zizhuo Wang · Yinyu Ye · Hao Yin -
2017 Talk: Online Learning to Rank in Stochastic Click Models »
Masrour Zoghi · Tomas Tunys · Mohammad Ghavamzadeh · Branislav Kveton · Csaba Szepesvari · Zheng Wen