Skip to yearly menu bar Skip to main content


Poster

Debiased Offline Representation Learning for Fast Online Adaptation in Non-stationary Dynamics

Xinyu Zhang · Wenjie Qiu · Yi-Chen Li · lei yuan · Chengxing Jia · Zongzhang Zhang · Yang Yu


Abstract:

Offline Reinforcement Learning (RL) attempts to learn an optimal policy from pre-collected datasets. Previous offline RL studies mostly assume deploying the learned policy in a stationary environment. Nevertheless, real-world scenarios naturally involve perturbations, necessitating adaptable policies for non-stationary environments. In this paper, we consider learning a policy, which can rapidly adapt to dynamics changes, from offline datasets generated in different environments. To this end, we propose Debiased Offline Representation learning for fast online Adaptation DORA. DORA employs a context encoder using recent state-action pairs to infer current dynamics. Due to the finiteness of the offline dataset, however, the representations from the encoder may exhibit a biased correlation with the unknown data-collecting behavior policy. This will incur erroneous identification of the dynamic when using the learned policy to collect context during online evaluation. To ensure the accuracy of the encoder, DORA follows the information bottleneck principle to 1) maximize the mutual information between the representation and the dynamic and 2) minimize the mutual information between the representation and behavior policy. For tractable optimization, we respectively derive the lower bound and upper bound of these two objectives. Experiment results on 6 MuJoCo tasks with 3 changeable parameters show that DORA significantly outperforms existing baselines.

Live content is unavailable. Log in and register to view live content