Learning Pseudometric-based Action Representations for Offline Reinforcement Learning

Pengjie Gu · Mengchen Zhao · Chen Chen · Dong Li · Jianye Hao · Bo An

Hall E #623

Keywords: [ RL: Batch/Offline ] [ MISC: Everything Else ]

[ Abstract ]
[
Wed 20 Jul 3:30 p.m. PDT — 5:30 p.m. PDT

Spotlight presentation: Miscellaneous Aspects of Machine Learning/Reinforcement Learning
Wed 20 Jul 1:30 p.m. PDT — 3 p.m. PDT

Abstract:

Offline reinforcement learning is a promising approach for practical applications since it does not require interactions with real-world environments. However, existing offline RL methods only work well in environments with continuous or small discrete action spaces. In environments with large and discrete action spaces, such as recommender systems and dialogue systems, the performance of existing methods decreases drastically because they suffer from inaccurate value estimation for a large proportion of out-of-distribution (o.o.d.) actions. While recent works have demonstrated that online RL benefits from incorporating semantic information in action representations, unfortunately, they fail to learn reasonable relative distances between action representations, which is key to offline RL to reduce the influence of o.o.d. actions. This paper proposes an action representation learning framework for offline RL based on a pseudometric, which measures both the behavioral relation and the data-distributional relation between actions. We provide theoretical analysis on the continuity of the expected Q-values and the offline policy improvement using the learned action representations. Experimental results show that our methods significantly improve the performance of two typical offline RL methods in environments with large and discrete action spaces.

Chat is not available.