Session
Generative Models
Tensor Variable Elimination for Plated Factor Graphs
Fritz Obermeyer · Elias Bingham · Martin Jankowiak · Neeraj Pradhan · Justin Chiu · Alexander Rush · Noah Goodman
A wide class of machine learning algorithms can be reduced to variable elimination on factor graphs. While factor graphs provide a unifying notation for these algorithms, they do not provide a compact way to express repeated structure when compared to plate diagrams for directed graphical models. To exploit efficient tensor algebra in graphs with plates of variables, we generalize undirected factor graphs to plated factor graphs and variable elimination to a tensor variable elimination algorithm that operates directly on plated factor graphs. Moreover, we generalize complexity bounds based on treewidth and characterize the class of plated factor graphs for which inference is tractable. As an application, we integrate tensor variable elimination into the Pyro probabilistic programming language to enable exact inference in discrete latent variable models with repeated structure. We validate our methods with experiments on both directed and undirected graphical models, including applications to polyphonic music modeling, animal movement modeling, and latent sentiment analysis.
Predicate Exchange: Inference with Declarative Knowledge
Zenna Tavares · Javier Burroni · Edgar Minasyan · Armando Solar-Lezama · Rajesh Ranganath
We address the problem of conditioning probabilistic models on predicates, as a means to express declarative knowledge. Models conditioned on predicates rarely have a tractable likelihood; sampling from them requires likelihood-free inference. Existing likelihood-free inference methods focus on predicates which express observations. To address a broader class of predicates, we develop an inference procedure called predicate exchange, which \emph{softens} predicates. Soft predicates return values in a continuous Boolean algebra and can serve as a proxy likelihood function in inference. However, softening introduces an approximation error which depends on a temperature parameter. At zero-temperature predicates are identical, but are often intractable to condition on. At higher temperatures, soft predicates are easier to sample from, but introduce more error. To mitigate this trade-off, we simulate Markov chains at different temperatures and use replica exchange to swap between chains. We implement predicate exchange through a nonstandard execution of a simulation based model, and provide a light-weight tool that can be supplanted on top of existing probabilistic programming formalisms. We demonstrate the approach on sequence models of health and inverse rendering.
Discriminative Regularization for Latent Variable Models with Applications to Electrocardiography
Andrew Miller · Ziad Obermeyer · John Cunningham · Sendhil Mullainathan
Generative models use latent variables to represent structured variation in high-dimensional data, such as images, audio, and medical waveforms. These latent variables, however, may ignore subtle, yet meaningful features in the data. Some features may predict an outcome of interest (e.g. heart disease) but account for only a small fraction of variation in the data. We propose a generative model training objective that uses a black-box discriminative model as a regularizer to learn representations that preserve this predictive variation. With these discriminatively regularized generative models, we visualize and measure variation in the data that influence a black-box predictive model, enabling an expert to better understand each prediction. With this technique, we study models that use electrocardiograms to predict outcomes of clinical interest. We measure our approach on synthetic and real data with statistical summaries and an experiment carried out by a physician.
Hierarchical Decompositional Mixtures of Variational Autoencoders
Ping Liang Tan · Robert Peharz
Variational autoencoders (VAEs) have received considerable attention, since they allow us to learn expressive neural density estimators effectively and efficiently. However, learning and inference in VAEs is still problematic due to the sensitive interplay between the generative model and the inference network. Since these problems become generally more severe in high dimensions, we propose a novel hierarchical mixture model over low-dimensional VAE experts. Our model decomposes the overall learning problem into many smaller problems, which are coordinated by the hierarchical mixture, represented by a sum-product network. In experiments we show that our model consistently outperforms classical VAEs on all of our experimental benchmarks. Moreover, we show that our model is highly data efficient and degrades very gracefully in extremely low data regimes.
Finding Mixed Nash Equilibria of Generative Adversarial Networks
Ya-Ping Hsieh · Chen Liu · Volkan Cevher
Generative adversarial networks (GANs) are known to achieve the state-of-the-art performance on various generative tasks, but these results come at the expense of a notoriously difficult training phase. Current training strategies typically draw a connection to optimization theory, whose scope is restricted to local convergence due to the presence of non-convexity. In this work, we tackle the training of GANs by rethinking the problem formulation from the mixed Nash Equilibria (NE) perspective. Via a classical lifting trick, we show that essentially all existing GAN objectives can be relaxed into their mixed strategy forms, whose global optima can be solved via sampling, in contrast to the exclusive use of optimization framework in previous work. We further propose a mean-approximation sampling scheme, which allows to systematically exploit methods for bi-affine games to delineate novel, practical training algorithms of GANs. Finally, we provide experimental evidence that our approach yields comparable or superior results to contemporary training algorithms, and outperforms classical methods such as SGD, Adam, and RMSProp.
CompILE: Compositional Imitation Learning and Execution
Thomas Kipf · Yujia Li · Hanjun Dai · Vinicius Zambaldi · Alvaro Sanchez-Gonzalez · Edward Grefenstette · Pushmeet Kohli · Peter Battaglia
We introduce Compositional Imitation Learning and Execution (CompILE): a framework for learning reusable, variable-length segments of hierarchically-structured behavior from demonstration data. CompILE uses a novel unsupervised, fully-differentiable sequence segmentation module to learn latent encodings of sequential data that can be re-composed and executed to perform new tasks. Once trained, our model generalizes to sequences of longer length and from environment instances not seen during training. We evaluate CompILE in a challenging 2D multi-task environment and a continuous control task, and show that it can find correct task boundaries and event encodings in an unsupervised manner. Latent codes and associated behavior policies discovered by CompILE can be used by a hierarchical agent, where the high-level policy selects actions in the latent code space, and the low-level, task-specific policies are simply the learned decoders. We found that our CompILE-based agent could learn given only sparse rewards, where agents without task-specific policies struggle.
Sparse Multi-Channel Variational Autoencoder for the Joint Analysis of Heterogeneous Data
Luigi Antelmi · Nicholas Ayache · Philippe Robert · Marco Lorenzi
Interpretable modeling of heterogeneous data channels is essential in medical applications, for example when jointly analyzing clinical scores and medical images. Variational Autoencoders (VAE) are powerful generative models that learn representations of complex data. The flexibility of VAE may come at the expense of lack of interpretability in describing the joint relationship between heterogeneous data. To tackle this problem, in this work we extend the variational framework of VAE to bring parsimony and interpretability when jointly account for latent relationships across multiple channels. In the latent space, this is achieved by constraining the variational distribution of each channel to a common target prior. Parsimonious latent representations are enforced by variational dropout. Experiments on synthetic data show that our model correctly identifies the prescribed latent dimensions and data relationships across multiple testing scenarios. When applied to imaging and clinical data, our method allows to identify the joint effect of age and pathology in describing clinical condition in a large scale clinical cohort.
Deep Generative Learning via Variational Gradient Flow
Yuan Gao · Yuling Jiao · Yang Wang · Yao Wang · Can Yang · Shunkang Zhang
We propose a general framework to learn deep generative models via \textbf{V}ariational \textbf{Gr}adient Fl\textbf{ow} (VGrow) on probability spaces. The evolving distribution that asymptotically converges to the target distribution is governed by a vector field, which is the negative gradient of the first variation of the $f$-divergence between them. We prove that the evolving distribution coincides with the pushforward distribution through the infinitesimal time composition of residual maps that are perturbations of the identity map along the vector field. The vector field depends on the density ratio of the pushforward distribution and the target distribution, which can be consistently learned from a binary classification problem. Connections of our proposed VGrow method with other popular methods, such as VAE, GAN and flow-based methods, have been established in this framework, gaining new insights of deep generative learning. We also evaluated several commonly used divergences, including Kullback-Leibler, Jensen-Shannon, Jeffrey divergences as well as our newly discovered ``logD'' divergence which serves as the objective function of the logD-trick GAN. Experimental results on benchmark datasets demonstrate that VGrow can generate high-fidelity images in a stable and efficient manner, achieving comparable performance with state-of-the-art GAN.
Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design
Jonathan Ho · Peter Chen · Aravind Srinivas · Rocky Duan · Pieter Abbeel
Flow-based generative models are powerful exact likelihood models with efficient sampling and inference. Despite their computational efficiency, flow-based models generally have much worse density modeling performance compared to state-of-the-art autoregressive models. In this paper, we investigate and improve upon three limiting design choices employed by flow-based models in prior work: the use of uniform noise for dequantization, the use of inexpressive affine flows, and the use of purely convolutional conditioning networks in coupling layers. Based on our findings, we propose Flow++, a new flow-based model that is now the state-of-the-art non-autoregressive model for unconditional density estimation on standard image benchmarks. Our work has begun to close the significant performance gap that has so far existed between autoregressive models and flow-based models.
Learning Neurosymbolic Generative Models via Program Synthesis
Halley R Young · Osbert Bastani · Mayur Naik
Significant strides have been made toward designing better generative models in recent years. Despite this progress, however, state-of-the-art approaches are still largely unable to capture complex global structure in data. For example, images of buildings typically contain spatial patterns such as windows repeating at regular intervals; state-of-the-art generative methods can't easily reproduce these structures. We propose to address this problem by incorporating programs representing global structure into the generative model---e.g., a 2D for-loop may represent a configuration of windows. Furthermore, we propose a framework for learning these models by leveraging program synthesis to generate training data. On both synthetic and real-world data, we demonstrate that our approach is substantially better than the state-of-the-art at both generating and completing images that contain global structure.