Timezone: »

 
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette · Martin Wainwright · Emma Brunskill

Actor-critic methods are widely used in offline reinforcement learning practice but are understudied theoretically. In this work we show that the pessimism principle can be naturally incorporated into actor-critic formulations. We create an offline actor-critic algorithm for a linear MDP model more general than the low-rank model. The procedure is both minimax optimal and computationally tractable.

Author Information

Andrea Zanette (Stanford University)
Martin Wainwright (UC Berkeley / Voleon)
Emma Brunskill (Stanford University)

More from the Same Authors