Skip to yearly menu bar Skip to main content


Poster

Hybrid Reinforcement Learning from Offline Observation Alone

Yuda Song · J. Bagnell · Aarti Singh


Abstract:

We consider the hybrid reinforcement learning setting where the agent has access to both offline data and online interactive access. However, canonically we assume offline data contains complete action, reward and transition information, while datasets with only state information (also known as observation-only datasets) are more general, abundant and practical. This motivates our study of the hybrid RL with observation-only offline dataset framework. While the task of competing with the best policy ``covered'' by the offline data can be solved if a reset model of the environment is provided (i.e., one that can be reset to any state), we show evidence of hardness with only the general trace model (i.e., one can only reset to the initial states and must produce full traces through the environment), without further assumption of admissibility of the offline data. Under the admissibility assumptions-- that the offline data could be produced by the policy class we consider-- we propose the first algorithm in the trace model setting that matches the provable performance of the algorithms in the reset model setting. We also perform proof-of-concept experiments that suggest the effectiveness of our algorithm in practice.

Live content is unavailable. Log in and register to view live content