Skip to yearly menu bar Skip to main content

Poster & Spotlight Talk
Workshop: Workshop on Reinforcement Learning Theory

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Andrea Zanette · Martin Wainwright · Emma Brunskill


Actor-critic methods are widely used in offline reinforcement learning practice but are understudied theoretically. In this work we show that the pessimism principle can be naturally incorporated into actor-critic formulations. We create an offline actor-critic algorithm for a linear MDP model more general than the low-rank model. The procedure is both minimax optimal and computationally tractable.