Skip to yearly menu bar Skip to main content


Session

Deep Generative Models

Abstract:
Chat is not available.

Thu 13 June 9:00 - 9:20 PDT

State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations

Alex Lamb · Jonathan Binas · Anirudh Goyal · Sandeep Subramanian · Ioannis Mitliagkas · Yoshua Bengio · Michael Mozer

Machine learning promises methods that generalize well from finite labeled data. However, the brittleness of existing neural net approaches is revealed by notable failures, such as the existence of adversarial examples that are misclassified despite being nearly identical to a training example, or the inability of recurrent sequence-processing nets to stay on track without teacher forcing. We introduce a method, which we refer to as state reification, that involves modeling the distribution of hidden states over the training data and then projecting hidden states observed during testing toward this distribution. Our intuition is that if the network can remain in a familiar manifold of hidden space, subsequent layers of the net should be well trained to respond appropriately. We show that this state-reification method helps neural nets to generalize better, especially when labeled data are sparse, and also helps overcome the challenge of achieving robust generalization with adversarial training.

Thu 13 June 9:20 - 9:25 PDT

Variational Laplace Autoencoders

Yookoon Park · Chris Kim · Gunhee Kim

Variational autoencoders employ an amortized inference model to predict the approximate posterior of latent variables. However, such amortized variational inference (AVI) faces two challenges: 1) limited expressiveness of the fully-factorized Gaussian posterior assumption and 2) the amortization error of the inference model. We propose an extended model named Variational Laplace Autoencoders that overcome both challenges to improve the training of the deep generative models. Specifically, we start from a class of rectified linear activation neural networks with Gaussian output and make a connection to probabilistic PCA. As a result, we derive iterative update equations that discover the mode of the posterior and define a local full-covariance Gaussian approximation centered at the mode. From the perspective of Laplace approximation, a generalization to a differentiable class of output distributions and activation functions is presented. Empirical results on MNIST, OMNIGLOT, FashionMNIST, SVHN and CIFAR10 show that the proposed approach significantly outperforms other amortized or iterative methods.

Thu 13 June 9:25 - 9:30 PDT

Latent Normalizing Flows for Discrete Sequences

Zachary Ziegler · Alexander Rush

Normalizing flows have been shown to be a powerful class of generative models for continuous random variables, giving both strong performance and the potential for non-autoregressive generation. These benefits are also desired when modeling discrete random variables such as text, but directly applying normalizing flows to discrete sequences poses significant additional challenges. We propose a generative model which jointly learns a normalizing flow-based distribution in the latent space and a stochastic mapping to an observed discrete space. In this setting, we find that it is crucial for the flow-based distribution to be highly multimodal. To capture this property, we propose several normalizing flow architectures to maximize model flexibility. Experiments consider common discrete sequence tasks of character-level language modeling and polyphonic music generation. Our results indicate that an autoregressive flow-based model can match the performance of a comparable autoregressive baseline, and a non-autoregressive flow-based model can improve generation speed with a penalty to performance.

Thu 13 June 9:30 - 9:35 PDT

Multi-objective training of Generative Adversarial Networks with multiple discriminators

Isabela Albuquerque · Joao Monteiro · Thang Doan · Breandan Considine · Tiago Falk · Ioannis Mitliagkas

Recent literature has demonstrated promising results on the training of Generative Adversarial Networks by employing a set of discriminators, as opposed to the traditional game involving one generator against a single adversary. Those methods perform single-objective optimization on some simple consolidation of the losses, e.g. an average. In this work, we revisit the multiple-discriminator approach by framing the simultaneous minimization of losses provided by different models as a multi-objective optimization problem. Specifically, we evaluate the performance of multiple gradient descent and the hypervolume maximization algorithm on a number of different datasets. Moreover, we argue that the previously proposed methods and hypervolume maximization can all be seen as variations of multiple gradient descent in which the update direction computation can be done efficiently. Our results indicate that hypervolume maximization presents a better compromise between sample quality and diversity, and computational cost than previous methods.

Thu 13 June 9:35 - 9:40 PDT

Learning Discrete and Continuous Factors of Data via Alternating Disentanglement

Yeonwoo Jeong · Hyun Oh Song

We address the problem of unsupervised disentanglement of discrete and continuous explanatory factors of data. We first show a simple procedure for minimizing the total correlation of the continuous latent variables without having to use a discriminator network or perform importance sampling, via cascading the information flow in the beta-VAE framework. Furthermore, we propose a method which avoids offloading the entire burden of jointly modeling the continuous and discrete factors to the variational encoder by employing a separate discrete inference procedure.

This leads to an interesting alternating minimization problem which switches between finding the most likely discrete configuration given the continuous factors and updating the variational encoder based on the computed discrete factors. Experiments show that the proposed method clearly disentangles discrete factors and significantly outperforms current disentanglement methods based on the disentanglement score and inference network classification score.

Thu 13 June 9:40 - 10:00 PDT

Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables

Friso Kingma · Pieter Abbeel · Jonathan Ho

The "bits back" argument (Wallace, 1990; Hinton & Van Camp, 1993) suggests lossless compression schemes with latent variable models. However, how to translate the "bits back" argument into efficient and practical lossless compression schemes is still largely an open problem. Bits-Back with Asymmetric Numeral Systems (Townsend et al., 2018) makes "bits back" coding practically feasible, yet when executed on hierarchical latent variable models, their algorithm becomes substantially inefficient. In this paper we propose Bit-Swap, a compression scheme that generalizes existing lossless compression techniques and results in strictly better compression rates for hierarchical latent variable models. Through experiments we verify that the proposed technique results in lossless compression rates that are empirically superior to existing techniques.

Thu 13 June 10:00 - 10:05 PDT

Graphite: Iterative Generative Modeling of Graphs

Aditya Grover · Aaron Zweig · Stefano Ermon

Graphs are a fundamental abstraction for modeling relational data. However, graphs are discrete and combinatorial in nature, and learning representations suitable for machine learning tasks poses statistical and computational challenges. In this work, we propose Graphite, an algorithmic framework for unsupervised learning of representations over nodes in large graphs using deep latent variable generative models. Our model is based on a novel combination of graph neural networks with variational autoencoders (VAE), and uses an iterative graph refinement strategy for decoding. This permits scaling to large graphs with thousands of nodes. Theoretically, we draw novel connections of graph neural networks with approximate inference via kernel embeddings. Empirically, Graphite outperforms competing approaches for the tasks of density estimation, link prediction, and node classification on synthetic and benchmark datasets.

Thu 13 June 10:05 - 10:10 PDT

Hybrid Models with Deep and Invertible Features

Eric Nalisnick · Akihiro Matsukawa · Yee-Whye Teh · Dilan Gorur · Balaji Lakshminarayanan

Deep neural networks are powerful black-box predictors for modeling conditional distributions of the form p(target|features). While they can be very successful at supervised learning problems where the train and test distributions are the same, they can make overconfident wrong predictions when the test distribution is different. Hybrid models that include both a discriminative conditional model p(target|features) and a generative model p(features) can be more robust under dataset shift, as they can detect covariate shift using the generative model. Current state-of-the-art hybrid models require approximate inference which can be computationally expensive. We propose an hybrid model that defines a generalized linear model on top of deep invertible features (e.g. normalizing flows). An attractive property of our model is that both p(features), the log density, and p(target|features), the predictive distribution, can be computed exactly in a single feed-forward pass. We show that our hybrid model achieves similar predictive accuracy as purely discriminative models on classification and regression tasks, while providing better uncertainty quantification and the ability to detect out-of-distribution inputs. In addition, we also demonstrate that the generative component of the hybrid model can leverage unlabeled data for semi-supervised learning, as well as generate samples which are useful to visualize and interpret the model. The availability of the exact joint density p(target,features) also allows us to compute many quantities readily, making our hybrid model an useful building block for downstream applications of probabilistic deep learning, including but not limited to active learning and domain adaptation.

Thu 13 June 10:10 - 10:15 PDT

MIWAE: Deep Generative Modelling and Imputation of Incomplete Data Sets

Pierre-Alexandre Mattei · Jes Frellsen

We consider the problem of handling missing data with deep latent variable models (DLVMs). First, we present a simple technique to train DLVMs when the training set contains missing-at-random data. Our approach, called MIWAE, is based on the importance-weighted autoencoder (IWAE), and maximises a potentially tight lower bound of the log-likelihood of the observed data. Compared to the original IWAE, our algorithm does not induce any additional computational overhead due to the missing data. We also develop Monte Carlo techniques for single and multiple imputation using a DLVM trained on an incomplete data set. We illustrate our approach by training a convolutional DLVM on a static binarisation of MNIST that contains 50% of missing pixels. Leveraging multiple imputation, a convolutional network trained on these incomplete digits has a test performance similar to one trained on complete data. On various continuous and binary data sets, we also show that MIWAE provides accurate single imputations, and is highly competitive with state-of-the-art methods.

Thu 13 June 10:15 - 10:20 PDT

On Scalable and Efficient Computation of Large Scale Optimal Transport

Yujia Xie · Minshuo Chen · Haoming Jiang · Tuo Zhao · Hongyuan Zha

Optimal Transport (OT) naturally arises in many machine learning applications, yet the heavy computational burden limits its wide-spread uses. To address the scalability issue, we propose an implicit generative learning-based framework called SPOT (Scalable Push-forward of Optimal Transport). Specifically, we approximate the optimal transport plan by a pushforward of a reference distribution, and cast the optimal transport problem into a minimax problem. We then can solve OT problems efficiently using primal dual stochastic gradient-type algorithms. We also show that we can recover the density of the optimal transport plan using neural ordinary differential equations. Numerical experiments on both synthetic and real datasets illustrate that SPOT is robust and has favorable convergence behavior. SPOT also allows us to efficiently sample from the optimal transport plan, which benefits downstream applications such as domain adaptation.