Timezone: »
The likelihood-ratio method is often used to estimate gradients of stochastic computations, for which baselines are required to reduce the estimation variance. Many types of baselines have been proposed, although their degree of optimality is not well understood. In this study, we establish a novel framework of gradient estimation that includes most of the common gradient estimators as special cases. The framework gives a natural derivation of the optimal estimator that can be interpreted as a special case of the likelihood-ratio method so that we can evaluate the optimal degree of practical techniques with it. It bridges the likelihood-ratio method and the reparameterization trick while still supporting discrete variables. It is derived from the exchange property of the differentiation and integration. To be more specific, it is derived by the reparameterization trick and local marginalization analogous to the local expectation gradient. We evaluate various baselines and the optimal estimator for variational learning and show that the performance of the modern estimators is close to the optimal estimator.
Author Information
Seiya Tokui (Preferred Networks / The University of Tokyo)
Seiya Tokui is a researcher at Preferred Networks, Inc., Japan, and also a Ph.D. student at the University of Tokyo. He received the master’s degree in mathematical informatics at the University of Tokyo in 2012. He is the lead developer of the deep learning framework, Chainer. His current research interests include deep learning, its software design, computer vision, and natural language processing.
Issei Sato (University of Tokyo / RIKEN)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Poster: Evaluating the Variance of Likelihood-Ratio Gradient Estimators »
Mon. Aug 7th 08:30 AM -- 12:00 PM Room Gallery #33
More from the Same Authors
-
2021 Poster: Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification »
Nan Lu · Shida Lei · Gang Niu · Issei Sato · Masashi Sugiyama -
2021 Spotlight: Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification »
Nan Lu · Shida Lei · Gang Niu · Issei Sato · Masashi Sugiyama -
2020 Poster: Few-shot Domain Adaptation by Causal Mechanism Transfer »
Takeshi Teshima · Issei Sato · Masashi Sugiyama -
2020 Poster: Accelerating the diffusion-based ensemble sampling by non-reversible dynamics »
Futoshi Futami · Issei Sato · Masashi Sugiyama -
2020 Poster: Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks Using PAC-Bayesian Analysis »
Yusuke Tsuzuku · Issei Sato · Masashi Sugiyama -
2018 Poster: Does Distributionally Robust Supervised Learning Give Robust Classifiers? »
Weihua Hu · Gang Niu · Issei Sato · Masashi Sugiyama -
2018 Oral: Does Distributionally Robust Supervised Learning Give Robust Classifiers? »
Weihua Hu · Gang Niu · Issei Sato · Masashi Sugiyama -
2018 Poster: Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model »
Hideaki Imamura · Issei Sato · Masashi Sugiyama -
2018 Oral: Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model »
Hideaki Imamura · Issei Sato · Masashi Sugiyama -
2017 Poster: Learning Discrete Representations via Information Maximizing Self-Augmented Training »
Weihua Hu · Takeru Miyato · Seiya Tokui · Eiichi Matsumoto · Masashi Sugiyama -
2017 Talk: Learning Discrete Representations via Information Maximizing Self-Augmented Training »
Weihua Hu · Takeru Miyato · Seiya Tokui · Eiichi Matsumoto · Masashi Sugiyama