Skip to yearly menu bar Skip to main content


Poster

Offline Actor-Critic Reinforcement Learning Scales to Large Models

Jost Tobias Springenberg · Abbas Abdolmaleki · Jingwei Zhang · Oliver M Groth · Michael Bloesch · Thomas Lampe · Philemon Brakel · Sarah Bechtle · Steven Kapturowski · Roland Hafner · Nicolas Heess · Martin Riedmiller

Hall C 4-9 #2707
[ ] [ Paper PDF ]
Wed 24 Jul 4:30 a.m. PDT — 6 a.m. PDT
 
Oral presentation: Oral 4A Reinforcement Learning 2
Wed 24 Jul 7:30 a.m. PDT — 8:30 a.m. PDT

Abstract:

We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset; containing both sub-optimal and expert behavior on 132 continuous control tasks. We introduce a Perceiver-based actor-critic model and elucidate the key features needed to make offline RL work with self- and cross-attention modules. Overall, we find that: i) simple offline actor critic algorithms are a natural choice for gradually moving away from the currently predominant paradigm of behavioral cloning, and ii) via offline RL it is possible to learn multi-task policies that master many domains simultaneously, including real robotics tasks, from sub-optimal demonstrations or self-generated data.

Chat is not available.