Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Models of Human Feedback for AI Alignment

Cross-Domain Knowledge Transfer for RL via Preference Consistency

Ting-Hsuan Huang · Ping-Chun Hsieh

[ ] [ Project Page ]
Fri 26 Jul 8 a.m. PDT — 8 a.m. PDT

Abstract:

We study the cross-domain RL (CDRL) problem from the perspective of preference-based learning. We identify the critical correspondence identifiability issue (CII) in the existing unsupervised CDRL methods and propose to mitigate CII with the weak supervision of preference feedback. Specifically, we propose the principle of cross-domain preference consistency (CDPC), which can serve as additional guidance for learning a proper correspondence between the source and target domains. To substantiate the principle of CDPC, we present an algorithm that integrates a state decoder learned by the preference consistency loss during training and a cross-domain MPC method for action selection during inference. Through extensive experiments in both MuJoCo and Robosuite, we demonstrate that CDPC can achieve effective and data-efficient knowledge transfer across domains than the state-of-the-art CDRL benchmark methods.

Chat is not available.