Timezone: »
Offline reinforcement learning is a promising approach for practical applications since it does not require interactions with real-world environments. However, existing offline RL methods only work well in environments with continuous or small discrete action spaces. In environments with large and discrete action spaces, such as recommender systems and dialogue systems, the performance of existing methods decreases drastically because they suffer from inaccurate value estimation for a large proportion of out-of-distribution (o.o.d.) actions. While recent works have demonstrated that online RL benefits from incorporating semantic information in action representations, unfortunately, they fail to learn reasonable relative distances between action representations, which is key to offline RL to reduce the influence of o.o.d. actions. This paper proposes an action representation learning framework for offline RL based on a pseudometric, which measures both the behavioral relation and the data-distributional relation between actions. We provide theoretical analysis on the continuity of the expected Q-values and the offline policy improvement using the learned action representations. Experimental results show that our methods significantly improve the performance of two typical offline RL methods in environments with large and discrete action spaces.
Author Information
Pengjie Gu (Nanyang Technological University)
Mengchen Zhao (Huawei Noah's Ark Lab)
Chen Chen (Huawei Noah’s Ark Lab)
Dong Li (Huawei Noah's Ark Lab)
Jianye Hao (Huawei Noah's Ark Lab)
Bo An (Nanyang Technological University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Learning Pseudometric-based Action Representations for Offline Reinforcement Learning »
Wed. Jul 20th through Thu the 21st Room Hall E #623
More from the Same Authors
-
2021 : Contingency-Aware Influence Maximization: A Reinforcement Learning Approach »
Haipeng Chen · Wei Qiu · Han-Ching Ou · Bo An · Milind Tambe -
2023 Poster: Controlling Type Confounding in Ad Hoc Teamwork with Instance-wise Teammate Feedback Rectification »
Dong Xing · Pengjie Gu · Qian Zheng · Xinrun Wang · Shanqi Liu · Longtao Zheng · Bo An · Gang Pan -
2022 Poster: Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization »
Minghuan Liu · Zhengbang Zhu · Yuzheng Zhuang · Weinan Zhang · Jianye Hao · Yong Yu · Jun Wang -
2022 Poster: Neuro-Symbolic Hierarchical Rule Induction »
Claire Glanois · Zhaohui Jiang · Xuening Feng · Paul Weng · Matthieu Zimmer · Dong Li · Wulong Liu · Jianye Hao -
2022 Spotlight: Neuro-Symbolic Hierarchical Rule Induction »
Claire Glanois · Zhaohui Jiang · Xuening Feng · Paul Weng · Matthieu Zimmer · Dong Li · Wulong Liu · Jianye Hao -
2022 Spotlight: Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization »
Minghuan Liu · Zhengbang Zhu · Yuzheng Zhuang · Weinan Zhang · Jianye Hao · Yong Yu · Jun Wang -
2022 Poster: Mitigating Neural Network Overconfidence with Logit Normalization »
Hongxin Wei · RENCHUNZI XIE · Hao Cheng · LEI FENG · Bo An · Sharon Li -
2022 Spotlight: Mitigating Neural Network Overconfidence with Logit Normalization »
Hongxin Wei · RENCHUNZI XIE · Hao Cheng · LEI FENG · Bo An · Sharon Li -
2022 Poster: Open-Sampling: Exploring Out-of-Distribution data for Re-balancing Long-tailed datasets »
Hongxin Wei · Lue Tao · RENCHUNZI XIE · LEI FENG · Bo An -
2022 Spotlight: Open-Sampling: Exploring Out-of-Distribution data for Re-balancing Long-tailed datasets »
Hongxin Wei · Lue Tao · RENCHUNZI XIE · LEI FENG · Bo An -
2021 Poster: Pointwise Binary Classification with Pairwise Confidence Comparisons »
Lei Feng · Senlin Shu · Nan Lu · Bo Han · Miao Xu · Gang Niu · Bo An · Masashi Sugiyama -
2021 Poster: Learning from Similarity-Confidence Data »
Yuzhou Cao · Lei Feng · Yitian Xu · Bo An · Gang Niu · Masashi Sugiyama -
2021 Spotlight: Learning from Similarity-Confidence Data »
Yuzhou Cao · Lei Feng · Yitian Xu · Bo An · Gang Niu · Masashi Sugiyama -
2021 Spotlight: Pointwise Binary Classification with Pairwise Confidence Comparisons »
Lei Feng · Senlin Shu · Nan Lu · Bo Han · Miao Xu · Gang Niu · Bo An · Masashi Sugiyama -
2020 Poster: Learning Efficient Multi-agent Communication: An Information Bottleneck Approach »
Rundong Wang · Xu He · Runsheng Yu · Wei Qiu · Bo An · Zinovi Rabinovich -
2020 Poster: Learning with Multiple Complementary Labels »
LEI FENG · Takuo Kaneko · Bo Han · Gang Niu · Bo An · Masashi Sugiyama