Timezone: »

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette · Martin Wainwright · Emma Brunskill

Sat Jul 24 10:45 AM -- 10:57 AM (PDT) @

Actor-critic methods are widely used in offline reinforcement learning practice but are understudied theoretically. In this work we show that the pessimism principle can be naturally incorporated into actor-critic formulations. We create an offline actor-critic algorithm for a linear MDP model more general than the low-rank model. The procedure is both minimax optimal and computationally tractable.

Author Information

Andrea Zanette (Stanford University)
Martin Wainwright (UC Berkeley / Voleon)
Emma Brunskill (Stanford University)

More from the Same Authors