Skip to yearly menu bar Skip to main content


Poster

Simple Ingredients for Offline Reinforcement Learning

Edoardo Cetin · Andrea Tirinzoni · Matteo Pirotta · Alessandro Lazaric · Yann Ollivier · Ahmed Touati


Abstract:

Offline reinforcement learning algorithms haveproven effective on datasets highly connected tothe target downstream task. Yet, leveraging anovel testbed (MOOD) in which trajectories comefrom heterogeneous sources, we show that existing methods struggle with diverse data: theirperformance considerably deteriorates as datacollected for related but different tasks is simplyadded to the offline buffer. In light of this finding, we conduct a large empirical study where weformulate and test several hypotheses to explainthis failure. Surprisingly, we find that scale, morethan algorithmic considerations, is the key factorinfluencing performance. We show that simplemethods like AWAC and IQL with increased network size overcome the paradoxical failure modesfrom the inclusion of additional data in MOOD,and notably outperform prior state-of-the-art algorithms on the canonical D4RL benchmark.

Live content is unavailable. Log in and register to view live content