Timezone: »
Oral
Do Differentiable Simulators Give Better Policy Gradients?
Hyung Ju Suh · Max Simchowitz · Kaiqing Zhang · Russ Tedrake
Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what factors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the utility of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and analyze this phenomenon through the lens of bias and variance. We additionally propose an $\alpha$-order gradient estimator, with $\alpha \in [0,1]$, which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zero-order methods. We demonstrate the pitfalls of traditional estimators and the advantages of the $\alpha$-order estimator on some numerical examples.
Author Information
Hyung Ju Suh (MIT)
Max Simchowitz (MIT)
Kaiqing Zhang (MIT)
Russ Tedrake (MIT)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Do Differentiable Simulators Give Better Policy Gradients? »
Thu. Jul 21st through Fri the 22nd Room Hall E #119
More from the Same Authors
-
2021 : Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity »
Kaiqing Zhang · Xiangyuan Zhang · Bin Hu · Tamer Basar -
2021 : Decentralized Q-Learning in Zero-sum Markov Games »
Kaiqing Zhang · David Leslie · Tamer Basar · Asuman Ozdaglar -
2023 : Toward Understanding Latent Model Learning in MuZero: A Case Study in Linear Quadratic Gaussian Control »
Yi Tian · Kaiqing Zhang · Russ Tedrake · Suvrit Sra -
2023 : On the Imitation of Non-Markovian Demonstrations: From Low-Level Stability to High-Level Planning »
Adam Block · Daniel Pfrommer · Max Simchowitz -
2023 Poster: The Power of Learned Locally Linear Models for Nonlinear Policy Optimization »
Daniel Pfrommer · Max Simchowitz · Tyler Westenbroek · Nikolai Matni · Stephen Tu -
2023 Poster: Statistical Learning under Heterogenous Distribution Shift »
Max Simchowitz · Anurag Ajay · Pulkit Agrawal · Akshay Krishnamurthy -
2022 : What is a Good Metric to Study Generalization of Minimax Learners? »
Asuman Ozdaglar · Sarath Pattathil · Jiawei Zhang · Kaiqing Zhang -
2022 Poster: First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Poster: Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Poster: On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning »
Weichao Mao · Lin Yang · Kaiqing Zhang · Tamer Basar -
2022 Spotlight: On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning »
Weichao Mao · Lin Yang · Kaiqing Zhang · Tamer Basar -
2022 Spotlight: Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Oral: First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Poster: Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence »
Dongsheng Ding · Chen-Yu Wei · Kaiqing Zhang · Mihailo Jovanovic -
2022 Oral: Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence »
Dongsheng Ding · Chen-Yu Wei · Kaiqing Zhang · Mihailo Jovanovic -
2021 Poster: Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs »
Weichao Mao · Kaiqing Zhang · Ruihao Zhu · David Simchi-Levi · Tamer Basar -
2021 Spotlight: Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs »
Weichao Mao · Kaiqing Zhang · Ruihao Zhu · David Simchi-Levi · Tamer Basar -
2021 Poster: Reinforcement Learning for Cost-Aware Markov Decision Processes »
Wesley A Suttle · Kaiqing Zhang · Zhuoran Yang · Ji Liu · David N Kraemer -
2021 Spotlight: Reinforcement Learning for Cost-Aware Markov Decision Processes »
Wesley A Suttle · Kaiqing Zhang · Zhuoran Yang · Ji Liu · David N Kraemer -
2020 Poster: FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis »
Aman Sinha · Matthew O'Kelly · Hongrui Zheng · Rahul Mangharam · John Duchi · Russ Tedrake -
2018 Poster: Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents »
Kaiqing Zhang · Zhuoran Yang · Han Liu · Tong Zhang · Tamer Basar -
2018 Oral: Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents »
Kaiqing Zhang · Zhuoran Yang · Han Liu · Tong Zhang · Tamer Basar