Workshop: The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward

Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning

Zhecheng Yuan · Zhecheng Yuan · Zhengrong Xue · Zhengrong Xue · Bo Yuan · Bo Yuan · Xueqian Wang · Xueqian Wang · Yi Wu · Yi Wu · Yang Gao · Yang Gao · Huazhe Xu · Huazhe Xu


Learning generalizable policies that can adapt to unseen environments remains challenging in visual Reinforcement Learning (RL). Existing approaches try to acquire a robust representation via diversifying the appearances of in-domain observations for better generalization. Limited by the specific observations of the environment, these methods ignore the possibility of exploring diverse real-world image datasets. In this paper, we investigate how a visual RL agent would benefit from the off-the-shelf visual representations. Surprisingly, we find that the early layers in an ImageNet pre-trained ResNet model could provide rather generalizable representations for visual RL. Hence, we propose Pre-trained Image Encoder for Generalizable visual reinforcement learning (PIE-G), a simple yet effective framework that can generalize to the unseen visual scenarios in a zero-shot manner. Extensive experiments are conducted on DMControl Generalization Benchmark, DMControl Manipulation Tasks, and Drawer World to verify the effectiveness of PIE-G. Empirical evidence suggests PIE-G can significantly outperforms previous state-of-the-art methods in terms of generalization performance. In particular, PIE-G boasts a 55% generalization performance gain on average in the challenging video background setting.

Chat is not available.