Timezone: »
The control variates (CV) method is widely used in policy gradient estimation to reduce the variance of the gradient estimators in practice. A control variate is applied by subtracting a baseline function from the state-action value estimates. Then the variance-reduced policy gradient presumably leads to higher learning efficiency. Recent research on control variates with deep neural net policies mainly focuses on scalar-valued baseline functions. The effect of vector-valued baselines is under-explored. This paper investigates variance reduction with coordinate-wise and layer-wise control variates constructed from vector-valued baselines for neural net policies. We present experimental evidence suggesting that lower variance can be obtained with such baselines than with the conventional scalar-valued baseline. We demonstrate how to equip the popular Proximal Policy Optimization (PPO) algorithm with these new control variates. We show that the resulting algorithm with proper regularization can achieve higher sample efficiency than scalar control variates in continuous control benchmarks.
Author Information
Yuanyi Zhong (University of Illinois at Urbana-Champaign)
Yuan Zhou (UIUC)
Jian Peng (UIUC)
More from the Same Authors
-
2022 : Is Self-Supervised Contrastive Learning More Robust Than Supervised Learning? »
Yuanyi Zhong · Haoran Tang · Junkun Chen · Jian Peng · Yu-Xiong Wang -
2022 Poster: Off-Policy Reinforcement Learning with Delayed Rewards »
Beining Han · Zhizhou Ren · Zuofan Wu · Yuan Zhou · Jian Peng -
2022 Spotlight: Off-Policy Reinforcement Learning with Delayed Rewards »
Beining Han · Zhizhou Ren · Zuofan Wu · Yuan Zhou · Jian Peng -
2022 Poster: Proximal Exploration for Model-guided Protein Sequence Design »
Zhizhou Ren · Jiahan Li · Fan Ding · Yuan Zhou · Jianzhu Ma · Jian Peng -
2022 Poster: Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets »
Xingang Peng · Shitong Luo · Jiaqi Guan · Qi Xie · Jian Peng · Jianzhu Ma -
2022 Spotlight: Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets »
Xingang Peng · Shitong Luo · Jiaqi Guan · Qi Xie · Jian Peng · Jianzhu Ma -
2022 Spotlight: Proximal Exploration for Model-guided Protein Sequence Design »
Zhizhou Ren · Jiahan Li · Fan Ding · Yuan Zhou · Jianzhu Ma · Jian Peng -
2021 Poster: Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity »
Zhang Zihan · Yuan Zhou · Xiangyang Ji -
2021 Spotlight: Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity »
Zhang Zihan · Yuan Zhou · Xiangyang Ji -
2020 Poster: Multinomial Logit Bandit with Low Switching Cost »
Kefan Dong · Yingkai Li · Qin Zhang · Yuan Zhou -
2020 Poster: A Chance-Constrained Generative Framework for Sequence Optimization »
Xianggen Liu · Qiang Liu · Sen Song · Jian Peng -
2019 Poster: Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization »
Chengyue Gong · Jian Peng · Qiang Liu -
2019 Poster: A Gradual, Semi-Discrete Approach to Generative Network Training via Explicit Wasserstein Minimization »
Yucheng Chen · Matus Telgarsky · Chao Zhang · Bolton Bailey · Daniel Hsu · Jian Peng -
2019 Oral: Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization »
Chengyue Gong · Jian Peng · Qiang Liu -
2019 Oral: A Gradual, Semi-Discrete Approach to Generative Network Training via Explicit Wasserstein Minimization »
Yucheng Chen · Matus Telgarsky · Chao Zhang · Bolton Bailey · Daniel Hsu · Jian Peng -
2018 Poster: Learning to Explore via Meta-Policy Gradient »
Tianbing Xu · Qiang Liu · Liang Zhao · Jian Peng -
2018 Oral: Learning to Explore via Meta-Policy Gradient »
Tianbing Xu · Qiang Liu · Liang Zhao · Jian Peng