Timezone: »
We study deep reinforcement learning (RL) algorithms with delayed rewards. In many real-world tasks, instant rewards are often not readily accessible or even defined immediately after the agent performs actions. In this work, we first formally define the environment with delayed rewards and discuss the challenges raised due to the non-Markovian nature of such environments. Then, we introduce a general off-policy RL framework with a new Q-function formulation that can handle the delayed rewards with theoretical convergence guarantees. For practical tasks with high dimensional state spaces, we further introduce the HC-decomposition rule of the Q-function in our framework which naturally leads to an approximation scheme that helps boost the training efficiency and stability. We finally conduct extensive experiments to demonstrate the superior performance of our algorithms over the existing work and their variants.
Author Information
Beining Han (IIIS, Tsinghua)
Zhizhou Ren (University of Illinois at Urbana-Champaign)
Zuofan Wu (Helixon Research)
Yuan Zhou (UIUC)
Jian Peng (UIUC)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Off-Policy Reinforcement Learning with Delayed Rewards »
Thu. Jul 21st through Fri the 22nd Room Hall E #927
More from the Same Authors
-
2021 : Coordinate-wise Control Variates for Deep Policy Gradients »
Yuanyi Zhong · Yuan Zhou · Jian Peng -
2022 : Is Self-Supervised Contrastive Learning More Robust Than Supervised Learning? »
Yuanyi Zhong · Haoran Tang · Junkun Chen · Jian Peng · Yu-Xiong Wang -
2022 Poster: Proximal Exploration for Model-guided Protein Sequence Design »
Zhizhou Ren · Jiahan Li · Fan Ding · Yuan Zhou · Jianzhu Ma · Jian Peng -
2022 Poster: Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets »
Xingang Peng · Shitong Luo · Jiaqi Guan · Qi Xie · Jian Peng · Jianzhu Ma -
2022 Spotlight: Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets »
Xingang Peng · Shitong Luo · Jiaqi Guan · Qi Xie · Jian Peng · Jianzhu Ma -
2022 Spotlight: Proximal Exploration for Model-guided Protein Sequence Design »
Zhizhou Ren · Jiahan Li · Fan Ding · Yuan Zhou · Jianzhu Ma · Jian Peng -
2022 Poster: Self-Organized Polynomial-Time Coordination Graphs »
Qianlan Yang · Weijun Dong · Zhizhou Ren · Jianhao Wang · Tonghan Wang · Chongjie Zhang -
2022 Spotlight: Self-Organized Polynomial-Time Coordination Graphs »
Qianlan Yang · Weijun Dong · Zhizhou Ren · Jianhao Wang · Tonghan Wang · Chongjie Zhang -
2021 Poster: Generalizable Episodic Memory for Deep Reinforcement Learning »
Hao Hu · Jianing Ye · Guangxiang Zhu · Zhizhou Ren · Chongjie Zhang -
2021 Spotlight: Generalizable Episodic Memory for Deep Reinforcement Learning »
Hao Hu · Jianing Ye · Guangxiang Zhu · Zhizhou Ren · Chongjie Zhang -
2021 Poster: Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity »
Zhang Zihan · Yuan Zhou · Xiangyang Ji -
2021 Spotlight: Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity »
Zhang Zihan · Yuan Zhou · Xiangyang Ji -
2020 Poster: Multinomial Logit Bandit with Low Switching Cost »
Kefan Dong · Yingkai Li · Qin Zhang · Yuan Zhou -
2020 Poster: A Chance-Constrained Generative Framework for Sequence Optimization »
Xianggen Liu · Qiang Liu · Sen Song · Jian Peng -
2019 Poster: Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization »
Chengyue Gong · Jian Peng · Qiang Liu -
2019 Poster: A Gradual, Semi-Discrete Approach to Generative Network Training via Explicit Wasserstein Minimization »
Yucheng Chen · Matus Telgarsky · Chao Zhang · Bolton Bailey · Daniel Hsu · Jian Peng -
2019 Oral: Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization »
Chengyue Gong · Jian Peng · Qiang Liu -
2019 Oral: A Gradual, Semi-Discrete Approach to Generative Network Training via Explicit Wasserstein Minimization »
Yucheng Chen · Matus Telgarsky · Chao Zhang · Bolton Bailey · Daniel Hsu · Jian Peng -
2018 Poster: Learning to Explore via Meta-Policy Gradient »
Tianbing Xu · Qiang Liu · Liang Zhao · Jian Peng -
2018 Oral: Learning to Explore via Meta-Policy Gradient »
Tianbing Xu · Qiang Liu · Liang Zhao · Jian Peng