Timezone: »
Actor-critic methods are widely used in offline reinforcement learning practice but are understudied theoretically. In this work we show that the pessimism principle can be naturally incorporated into actor-critic formulations. We create an offline actor-critic algorithm for a linear MDP model more general than the low-rank model. The procedure is both minimax optimal and computationally tractable.
Author Information
Andrea Zanette (Stanford University)
Martin Wainwright (UC Berkeley / Voleon)
Emma Brunskill (Stanford University)

Emma Brunskill is an associate tenured professor in the Computer Science Department at Stanford University. Brunskill’s lab aims to create AI systems that learn from few samples to robustly make good decisions and is part of the Stanford AI Lab, the Stanford Statistical ML group, and AI Safety @Stanford. Brunskill has received a NSF CAREER award, Office of Naval Research Young Investigator Award, a Microsoft Faculty Fellow award and an alumni impact award from the computer science and engineering department at the University of Washington. Brunskill and her lab have received multiple best paper nominations and awards both for their AI and machine learning work (UAI best paper, Reinforcement Learning and Decision Making Symposium best paper twice) and for their work in Ai of education (Intelligent Tutoring Systems Conference, Educational Data Mining conference x3, CHI).
More from the Same Authors
-
2021 : Optimal and instance-dependent oracle inequalities for policy evaluation »
Wenlong Mou · Ashwin Pananjady · Martin Wainwright -
2021 : Model-based Offline Reinforcement Learning with Local Misspecification »
Kefan Dong · Ramtin Keramati · Emma Brunskill -
2021 : Estimating Optimal Policy Value in Linear Contextual Bandits beyond Gaussianity »
Jonathan Lee · Weihao Kong · Aldo Pacchiano · Vidya Muthukumar · Emma Brunskill -
2021 : Avoiding Overfitting to the Importance Weights in Offline Policy Optimization »
Yao Liu · Emma Brunskill -
2022 : Giving Complex Feedback in Online Student Learning with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2022 : Giving Feedback on Interactive Student Programs with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2023 : Experiment Planning with Function Approximation »
Aldo Pacchiano · Jonathan Lee · Emma Brunskill -
2023 : In-Context Decision-Making from Supervised Pretraining »
Jonathan Lee · Annie Xie · Aldo Pacchiano · Yash Chandak · Chelsea Finn · Ofir Nachum · Emma Brunskill -
2023 : Experiment Planning with Function Approximation »
Aldo Pacchiano · Jonathan Lee · Emma Brunskill -
2023 Panel: ICML Education Outreach Panel »
Andreas Krause · Barbara Engelhardt · Emma Brunskill · Kyunghyun Cho -
2022 : Giving Complex Feedback in Online Student Learning with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2022 Poster: Stabilizing Q-learning with Linear Architectures for Provable Efficient Learning »
Andrea Zanette · Martin Wainwright -
2022 Poster: A new similarity measure for covariate shift with applications to nonparametric regression »
Reese Pathak · Cong Ma · Martin Wainwright -
2022 Spotlight: Stabilizing Q-learning with Linear Architectures for Provable Efficient Learning »
Andrea Zanette · Martin Wainwright -
2022 Oral: A new similarity measure for covariate shift with applications to nonparametric regression »
Reese Pathak · Cong Ma · Martin Wainwright -
2022 : Invited Talk: Emma Brunskill »
Emma Brunskill -
2021 : Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning »
Andrea Zanette · Martin Wainwright · Emma Brunskill -
2021 Poster: Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL »
Andrea Zanette -
2021 Oral: Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL »
Andrea Zanette -
2020 Workshop: Theoretical Foundations of Reinforcement Learning »
Emma Brunskill · Thodoris Lykouris · Max Simchowitz · Wen Sun · Mengdi Wang -
2020 Poster: Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions »
Omer Gottesman · Joseph Futoma · Yao Liu · Sonali Parbhoo · Leo Celi · Emma Brunskill · Finale Doshi-Velez -
2020 Poster: Learning Near Optimal Policies with Low Inherent Bellman Error »
Andrea Zanette · Alessandro Lazaric · Mykel Kochenderfer · Emma Brunskill -
2020 Poster: Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling »
Yao Liu · Pierre-Luc Bacon · Emma Brunskill -
2019 Workshop: Exploration in Reinforcement Learning Workshop »
Benjamin Eysenbach · Benjamin Eysenbach · Surya Bhupatiraju · Shixiang Gu · Harrison Edwards · Martha White · Pierre-Yves Oudeyer · Kenneth Stanley · Emma Brunskill -
2019 : Emma Brunskill (Stanford) - Minimizing & Understanding the Data Needed to Learn to Make Good Sequences of Decisions »
Emma Brunskill -
2019 : panel discussion with Craig Boutilier (Google Research), Emma Brunskill (Stanford), Chelsea Finn (Google Brain, Stanford, UC Berkeley), Mohammad Ghavamzadeh (Facebook AI), John Langford (Microsoft Research) and David Silver (Deepmind) »
Peter Stone · Craig Boutilier · Emma Brunskill · Chelsea Finn · John Langford · David Silver · Mohammad Ghavamzadeh -
2019 Poster: Combining parametric and nonparametric models for off-policy evaluation »
Omer Gottesman · Yao Liu · Scott Sussex · Emma Brunskill · Finale Doshi-Velez -
2019 Oral: Combining parametric and nonparametric models for off-policy evaluation »
Omer Gottesman · Yao Liu · Scott Sussex · Emma Brunskill · Finale Doshi-Velez -
2019 Poster: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Poster: Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds »
Andrea Zanette · Emma Brunskill -
2019 Poster: Separable value functions across time-scales »
Joshua Romoff · Peter Henderson · Ahmed Touati · Yann Ollivier · Joelle Pineau · Emma Brunskill -
2019 Oral: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Oral: Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds »
Andrea Zanette · Emma Brunskill -
2019 Oral: Separable value functions across time-scales »
Joshua Romoff · Peter Henderson · Ahmed Touati · Yann Ollivier · Joelle Pineau · Emma Brunskill -
2018 Poster: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill -
2018 Oral: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill -
2018 Poster: Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs »
Andrea Zanette · Emma Brunskill -
2018 Oral: Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs »
Andrea Zanette · Emma Brunskill