Timezone: »

A Connection between One-Step RL and Critic Regularization in Reinforcement Learning
Benjamin Eysenbach · Matthieu Geist · Sergey Levine · Ruslan Salakhutdinov

Thu Jul 27 04:30 PM -- 06:00 PM (PDT) @ Exhibit Hall 1 #310

As with any machine learning problem with limited data, effective offline RL algorithms require careful regularization to avoid overfitting. One class of methods, known as one-step RL, perform just one step of policy improvement. These methods, which include advantage-weighted regression and conditional behavioral cloning, are thus simple and stable, but can have limited asymptotic performance. A second class of methods, known as critic regularization, perform many steps of policy improvement with a regularized objective. These methods typically require more compute but have appealing lower-bound guarantees. In this paper, we draw a connection between these methods: applying a multi-step critic regularization method with a regularization coefficient of 1 yields the same policy as one-step RL. While our theoretical results require assumptions (e.g., deterministic dynamics), our experiments nevertheless show that our analysis makes accurate, testable predictions about practical offline RL methods (CQL and one-step RL) with commonly-used hyperparameters.

Author Information

Benjamin Eysenbach (CMU→Princeton)
Matthieu Geist (Google)
Sergey Levine (University of Washington)
Ruslan Salakhutdinov (Carnegie-Mellon University)

More from the Same Authors