Timezone: »

A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson

Wed Jun 12 03:10 PM -- 03:15 PM (PDT) @ Hall B

By enabling correct differentiation in Stochastic Computation Graphs (SCGs), the infinitely differentiable Monte-Carlo estimator (DiCE) can generate correct estimates for the higher order gradients that arise in, e.g., multi-agent reinforcement learning and meta-learning. However, the baseline term in DiCE that serves as a control variate for reducing variance applies only to first order gradient estimation, limiting the utility of higher-order gradient estimates. To improve the sample efficiency of DiCE, we propose a new baseline term for higher order gradient estimation. This term may be easily included in the objective, and produces unbiased variance-reduced estimators under (automatic) differentiation, without affecting the estimate of the objective itself or of the first order gradient estimate. It reuses the same baseline function (e.g., the state-value function in reinforcement learning) already used for the first order baseline. We provide theoretical analysis and numerical evaluations of this new baseline, which demonstrate that it can dramatically reduce the variance of DiCE's second order gradient estimators and also show empirically that it reduces the variance of third and fourth order gradients. This computational tool can be easily used to estimate higher order gradients with unprecedented efficiency and simplicity wherever automatic differentiation is utilised, and it has the potential to unlock applications of higher order gradients in reinforcement learning and meta-learning.

Author Information

Jingkai Mao (Man AHL)
Jakob Foerster (Facebook AI Research)

Jakob Foerster started as an Associate Professor at the department of engineering science at the University of Oxford in the fall of 2021. During his PhD at Oxford he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind. After his PhD he worked as a research scientist at Facebook AI Research in California, where he continued doing foundational work. He was the lead organizer of the first Emergent Communication workshop at NeurIPS in 2017, which he has helped organize ever since and was awarded a prestigious CIFAR AI chair in 2019. His past work addresses how AI agents can learn to cooperate and communicate with other agents, most recently he has been developing and addressing the zero-shot coordination problem setting, a crucial step towards human-AI coordination.

Tim Rocktäschel (University of Oxford)
Maruan Al-Shedivat (Carnegie Mellon University)
Gregory Farquhar (University of Oxford)
Shimon Whiteson (University of Oxford)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors