Timezone: »
Offline reinforcement learning suffers from the out-of-distribution issue and extrapolation error. Most policy constraint methods regularize the density of the trained policy towards the behavior policy, which is too restrictive in most cases. We propose Supported Trust Region optimization (STR) which performs trust region policy optimization with the policy constrained within the support of the behavior policy, enjoying the less restrictive support constraint. We show that, when assuming no approximation and sampling error, STR guarantees strict policy improvement until convergence to the optimal support-constrained policy in the dataset. Further with both errors incorporated, STR still guarantees safe policy improvement for each step. Empirical results validate the theory of STR and demonstrate its state-of-the-art performance on MuJoCo locomotion domains and much more challenging AntMaze domains.
Author Information
Yixiu Mao (Tsinghua University)
Hongchang Zhang (Tsinghua University, Tsinghua University)
Chen Chen (Qiyuan Lab)
Yi Xu (Alibaba Group (U.S.) Inc.)
Xiangyang Ji (Tsinghua University)
More from the Same Authors
-
2023 Poster: Complementary Attention for Multi-Agent Reinforcement Learning »
Jianzhun Shao · Hongchang Zhang · Yun Qu · Chang Liu · Shuncheng He · Yuhang Jiang · Xiangyang Ji -
2023 Poster: No One Idles: Efficient Heterogeneous Federated Learning with Parallel Edge and Server Computation »
Feilong Zhang · Xianming Liu · Shiyi Lin · Gang Wu · Xiong Zhou · Junjun Jiang · Xiangyang Ji -
2022 Poster: Prototype-Anchored Learning for Learning with Imperfect Annotations »
Xiong Zhou · Xianming Liu · Deming Zhai · Junjun Jiang · Xin Gao · Xiangyang Ji -
2022 Spotlight: Prototype-Anchored Learning for Learning with Imperfect Annotations »
Xiong Zhou · Xianming Liu · Deming Zhai · Junjun Jiang · Xin Gao · Xiangyang Ji -
2021 Poster: Dash: Semi-Supervised Learning with Dynamic Thresholding »
Yi Xu · Lei Shang · Jinxing Ye · Qi Qian · Yu-Feng Li · Baigui Sun · Hao Li · rong jin -
2021 Oral: Dash: Semi-Supervised Learning with Dynamic Thresholding »
Yi Xu · Lei Shang · Jinxing Ye · Qi Qian · Yu-Feng Li · Baigui Sun · Hao Li · rong jin -
2021 Poster: Asymmetric Loss Functions for Learning with Noisy Labels »
Xiong Zhou · Xianming Liu · Junjun Jiang · Xin Gao · Xiangyang Ji -
2021 Spotlight: Asymmetric Loss Functions for Learning with Noisy Labels »
Xiong Zhou · Xianming Liu · Junjun Jiang · Xin Gao · Xiangyang Ji -
2021 Poster: Near Optimal Reward-Free Reinforcement Learning »
Zhang Zihan · Simon Du · Xiangyang Ji -
2021 Oral: Near Optimal Reward-Free Reinforcement Learning »
Zhang Zihan · Simon Du · Xiangyang Ji -
2021 Poster: Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity »
Zhang Zihan · Yuan Zhou · Xiangyang Ji -
2021 Spotlight: Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity »
Zhang Zihan · Yuan Zhou · Xiangyang Ji -
2020 Poster: Stochastic Optimization for Non-convex Inf-Projection Problems »
Yan Yan · Yi Xu · Lijun Zhang · Wang Xiaoyu · Tianbao Yang -
2019 Poster: Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence »
Yi Xu · Qi Qi · Qihang Lin · rong jin · Tianbao Yang -
2019 Oral: Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence »
Yi Xu · Qi Qi · Qihang Lin · rong jin · Tianbao Yang -
2019 Poster: Katalyst: Boosting Convex Katayusha for Non-Convex Problems with a Large Condition Number »
Zaiyi Chen · Yi Xu · Haoyuan Hu · Tianbao Yang -
2019 Oral: Katalyst: Boosting Convex Katayusha for Non-Convex Problems with a Large Condition Number »
Zaiyi Chen · Yi Xu · Haoyuan Hu · Tianbao Yang -
2018 Poster: SADAGRAD: Strongly Adaptive Stochastic Gradient Methods »
Zaiyi Chen · Yi Xu · Enhong Chen · Tianbao Yang -
2018 Oral: SADAGRAD: Strongly Adaptive Stochastic Gradient Methods »
Zaiyi Chen · Yi Xu · Enhong Chen · Tianbao Yang -
2017 Poster: Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence »
Yi Xu · Qihang Lin · Tianbao Yang -
2017 Talk: Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence »
Yi Xu · Qihang Lin · Tianbao Yang