Poster
in
Workshop: The Second Workshop on Spurious Correlations, Invariance and Stability

Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline Reinforcement Learning

PENG CHENG · Xianyuan Zhan · Zhihao Wu · Wenjia Zhang · Youfang Lin · Shou cheng Song · Han Wang

Project Page [ Poster] [ OpenReview]

Abstract

Offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets. However, the performance of existing offline RL algorithms heavily depends on the scale and state-action space coverage of datasets. Real-world data collection is often expensive and uncontrollable, leading to small and narrowly covered datasets and posing significant challenges for practical deployments of offline RL. In this paper, we propose a Time-reversal symmetry (T-symmetry) enforced Dynamics Model (TDM), which establishes consistency between a pair of forward and reverse latent dynamics. TDM provides both well-behaved representations for small datasets and a new reliability measure for OOD samples based on compliance with the T-symmetry. These can be readily used to construct a new offline RL algorithm (TSRL) with less conservative policy constraints and a reliable latent space data augmentation procedure. We find TSRL achieves great performance on small benchmark datasets with as few as 1% of the original samples.

Chat is not available.