Timezone: »

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette · Martin Wainwright · Emma Brunskill

Actor-critic methods are widely used in offline reinforcement learning practice but are understudied theoretically. In this work we show that the pessimism principle can be naturally incorporated into actor-critic formulations. We create an offline actor-critic algorithm for a linear MDP model more general than the low-rank model. The procedure is both minimax optimal and computationally tractable.

Author Information

Andrea Zanette (Stanford University)
Martin Wainwright (UC Berkeley / Voleon)
Emma Brunskill (Stanford University)
Emma Brunskill

Emma Brunskill is an associate tenured professor in the Computer Science Department at Stanford University. Brunskill’s lab aims to create AI systems that learn from few samples to robustly make good decisions and is part of the Stanford AI Lab, the Stanford Statistical ML group, and AI Safety @Stanford. Brunskill has received a NSF CAREER award, Office of Naval Research Young Investigator Award, a Microsoft Faculty Fellow award and an alumni impact award from the computer science and engineering department at the University of Washington. Brunskill and her lab have received multiple best paper nominations and awards both for their AI and machine learning work (UAI best paper, Reinforcement Learning and Decision Making Symposium best paper twice) and for their work in Ai of education (Intelligent Tutoring Systems Conference, Educational Data Mining conference x3, CHI).

More from the Same Authors