Timezone: »
Policy optimization is a core component of reinforcement learning (RL), and most existing RL methods directly optimize parameters of a policy based on maximizing the expected total reward, or its surrogate. Though often achieving encouraging empirical success, its correspondence to policy-distribution optimization has been unclear mathematically. We place policy optimization into the space of probability measures, and interpret it as Wasserstein gradient flows. On the probability-measure space, under specified circumstances, policy optimization becomes convex in terms of distribution optimization. To make optimization feasible, we develop efficient algorithms by numerically solving the corresponding discrete gradient flows. Our technique is applicable to several RL settings, and is related to many state-of-the-art policy-optimization algorithms. Specifically, we define gradient flows on both the parameter-distribution space and policy-distribution space, leading to what we term indirect-policy and direct-policy learning frameworks, respectively. Extensive experiments verify the effectiveness of our framework, often obtaining better performance compared to related algorithms.
Author Information
RUIYI (ROY) ZHANG (Duke University)
Changyou Chen (SUNY at Buffalo)
Chunyuan Li (Duke University)
Lawrence Carin (Duke)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Oral: Policy Optimization as Wasserstein Gradient Flows »
Fri. Jul 13th 02:20 -- 02:30 PM Room A1
More from the Same Authors
-
2021 : Hölder Bounds for Sensitivity Analysis in Causal Reasoning »
Serge Assaad · Shuxi Zeng · Henry Pfister · Fan Li · Lawrence Carin -
2023 Poster: Learning Unnormalized Statistical Models via Compositional Optimization »
Wei Jiang · Jiayu Qin · Lingyu Wu · Changyou Chen · Tianbao Yang · Lijun Zhang -
2023 Poster: Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated Learning via Class-Imbalance Reduction »
Jianyi Zhang · Ang Li · Minxue Tang · Jingwei Sun · Xiang Chen · Fan Zhang · Changyou Chen · Yiran Chen · Hai Li -
2020 Poster: Learning Autoencoders with Relational Regularization »
Hongteng Xu · Dixin Luo · Ricardo Henao · Svati Shah · Lawrence Carin -
2020 Poster: Graph Optimal Transport for Cross-Domain Alignment »
Liqun Chen · Zhe Gan · Yu Cheng · Linjie Li · Lawrence Carin · Jingjing Liu -
2020 Poster: On Leveraging Pretrained GANs for Generation with Limited Data »
Miaoyun Zhao · Yulai Cong · Lawrence Carin -
2020 Poster: CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information »
Pengyu Cheng · Weituo Hao · Shuyang Dai · Jiachang Liu · Zhe Gan · Lawrence Carin -
2020 Poster: Variance Reduction in Stochastic Particle-Optimization Sampling »
Jianyi Zhang · Yang Zhao · Changyou Chen -
2019 : Invited Talk - Ruiyi Zhang: On Wasserstein Gradient Flows and Particle-Based Variational Inference »
RUIYI (ROY) ZHANG -
2019 Poster: Gromov-Wasserstein Learning for Graph Matching and Node Embedding »
Hongteng Xu · Dixin Luo · Hongyuan Zha · Lawrence Carin -
2019 Oral: Gromov-Wasserstein Learning for Graph Matching and Node Embedding »
Hongteng Xu · Dixin Luo · Hongyuan Zha · Lawrence Carin -
2019 Poster: Stochastic Blockmodels meet Graph Neural Networks »
Nikhil Mehta · Lawrence Carin · Piyush Rai -
2019 Poster: Understanding and Accelerating Particle-Based Variational Inference »
Chang Liu · Jingwei Zhuo · Pengyu Cheng · RUIYI (ROY) ZHANG · Jun Zhu -
2019 Poster: Variational Annealing of GANs: A Langevin Perspective »
Chenyang Tao · Shuyang Dai · Liqun Chen · Ke Bai · Junya Chen · Chang Liu · RUIYI (ROY) ZHANG · Georgiy Bobashev · Lawrence Carin -
2019 Oral: Stochastic Blockmodels meet Graph Neural Networks »
Nikhil Mehta · Lawrence Carin · Piyush Rai -
2019 Oral: Understanding and Accelerating Particle-Based Variational Inference »
Chang Liu · Jingwei Zhuo · Pengyu Cheng · RUIYI (ROY) ZHANG · Jun Zhu -
2019 Oral: Variational Annealing of GANs: A Langevin Perspective »
Chenyang Tao · Shuyang Dai · Liqun Chen · Ke Bai · Junya Chen · Chang Liu · RUIYI (ROY) ZHANG · Georgiy Bobashev · Lawrence Carin -
2018 Poster: Learning Registered Point Processes from Idiosyncratic Observations »
Hongteng Xu · Lawrence Carin · Hongyuan Zha -
2018 Poster: JointGAN: Multi-Domain Joint Distribution Learning with Generative Adversarial Nets »
Yunchen Pu · Shuyang Dai · Zhe Gan · Weiyao Wang · Guoyin Wang · Yizhe Zhang · Ricardo Henao · Lawrence Carin -
2018 Oral: JointGAN: Multi-Domain Joint Distribution Learning with Generative Adversarial Nets »
Yunchen Pu · Shuyang Dai · Zhe Gan · Weiyao Wang · Guoyin Wang · Yizhe Zhang · Ricardo Henao · Lawrence Carin -
2018 Oral: Learning Registered Point Processes from Idiosyncratic Observations »
Hongteng Xu · Lawrence Carin · Hongyuan Zha -
2018 Poster: Adversarial Time-to-Event Modeling »
Paidamoyo Chapfuwa · Chenyang Tao · Chunyuan Li · Courtney Page · Benjamin Goldstein · Lawrence Carin · Ricardo Henao -
2018 Oral: Adversarial Time-to-Event Modeling »
Paidamoyo Chapfuwa · Chenyang Tao · Chunyuan Li · Courtney Page · Benjamin Goldstein · Lawrence Carin · Ricardo Henao -
2018 Poster: Continuous-Time Flows for Efficient Inference and Density Estimation »
Changyou Chen · Chunyuan Li · Liquan Chen · Wenlin Wang · Yunchen Pu · Lawrence Carin -
2018 Poster: Chi-square Generative Adversarial Network »
Chenyang Tao · Liqun Chen · Ricardo Henao · Jianfeng Feng · Lawrence Carin -
2018 Poster: Variational Inference and Model Selection with Generalized Evidence Bounds »
Liqun Chen · Chenyang Tao · RUIYI (ROY) ZHANG · Ricardo Henao · Lawrence Carin -
2018 Oral: Chi-square Generative Adversarial Network »
Chenyang Tao · Liqun Chen · Ricardo Henao · Jianfeng Feng · Lawrence Carin -
2018 Oral: Continuous-Time Flows for Efficient Inference and Density Estimation »
Changyou Chen · Chunyuan Li · Liquan Chen · Wenlin Wang · Yunchen Pu · Lawrence Carin -
2018 Oral: Variational Inference and Model Selection with Generalized Evidence Bounds »
Liqun Chen · Chenyang Tao · RUIYI (ROY) ZHANG · Ricardo Henao · Lawrence Carin -
2017 Poster: Stochastic Gradient Monomial Gamma Sampler »
Yizhe Zhang · Changyou Chen · Zhe Gan · Ricardo Henao · Lawrence Carin -
2017 Poster: Adversarial Feature Matching for Text Generation »
Yizhe Zhang · Zhe Gan · Kai Fan · Zhi Chen · Ricardo Henao · Dinghan Shen · Lawrence Carin -
2017 Talk: Adversarial Feature Matching for Text Generation »
Yizhe Zhang · Zhe Gan · Kai Fan · Zhi Chen · Ricardo Henao · Dinghan Shen · Lawrence Carin -
2017 Talk: Stochastic Gradient Monomial Gamma Sampler »
Yizhe Zhang · Changyou Chen · Zhe Gan · Ricardo Henao · Lawrence Carin -
2017 Poster: Deep Generative Models for Relational Data with Side Information »
Changwei Hu · Piyush Rai · Lawrence Carin -
2017 Talk: Deep Generative Models for Relational Data with Side Information »
Changwei Hu · Piyush Rai · Lawrence Carin