The workshop focuses on theory, methodology, and application of structured probabilistic inference and generative modeling, both of which are important topics in machine learning.Specifically, probabilistic inference addresses the problem of amortization,sampling, and integration of complex quantities from graphical models, while generative modeling captures the underlying probability distributions of a dataset. Apart from applications in computer vision, natural language processing, and speech recognition, probabilistic inference and generative modeling approaches have also been widely used in natural science domains, including physics, chemistry, molecular biology, and medicine. Despite the promising results, probabilistic methods face challenges when applied to highly structured data, which are ubiquitous in real-world settings, limiting the applications of such methods. This workshop aims to bring experts from diverse backgrounds and related domains together to discuss the applications and challenges of probabilistic methods. The workshop will emphasize challenges in encoding domain knowledge when learning representations, performing inference and generations. By bringing together experts from academia and industry, the workshop will provide a platform for researchers to share their latest results and ideas, fostering collaboration and discussion in the field of probabilistic methods.
Fri 12:00 p.m. - 12:10 p.m.
|
Opening Remark
SlidesLive Video » |
Dinghuai Zhang · Yuanqi Du · Chenlin Meng · Shawn Tan · Yingzhen Li · Max Welling · Yoshua Bengio 🔗 |
Fri 12:10 p.m. - 12:50 p.m.
|
Invited Talk by Karen Ullrich
(
Invited Talk
)
SlidesLive Video » |
Karen Ullrich 🔗 |
Fri 12:50 p.m. - 1:30 p.m.
|
Invited Talk by Tommi Jaakkola
(
Invited Talk
)
SlidesLive Video » |
Tommi Jaakkola 🔗 |
Fri 1:30 p.m. - 1:50 p.m.
|
Coffee Break
|
🔗 |
Fri 1:50 p.m. - 2:30 p.m.
|
Invited Talk by Durk Kingma
(
Invited Talk
)
SlidesLive Video » |
Diederik Kingma 🔗 |
Fri 2:30 p.m. - 2:40 p.m.
|
Collapsed Inference for Bayesian Deep Learning
(
Contributed Talk
)
SlidesLive Video » |
🔗 |
Fri 2:40 p.m. - 2:50 p.m.
|
Provable benefits of score matching
(
Contributed Talk
)
SlidesLive Video » |
Andrej Risteski 🔗 |
Fri 2:50 p.m. - 3:00 p.m.
|
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping
(
Contributed Talk
)
SlidesLive Video » |
🔗 |
Fri 3:00 p.m. - 4:00 p.m.
|
Poster Session 1
(
Poster Session
)
|
🔗 |
Fri 4:00 p.m. - 5:00 p.m.
|
Panel Discussion
SlidesLive Video » |
Chenlin Meng · Yang Song · Yilun Xu · Ricky T. Q. Chen · Charlotte Bunne · Arash Vahdat 🔗 |
Fri 5:00 p.m. - 5:40 p.m.
|
Invited Talk by Ruqi Zhang
(
Invited Talk
)
SlidesLive Video » |
Ruqi Zhang 🔗 |
Fri 5:40 p.m. - 6:20 p.m.
|
Invited Talk by Stefano Ermon
(
Invited Talk
)
SlidesLive Video » |
Stefano Ermon 🔗 |
Fri 6:20 p.m. - 6:30 p.m.
|
BayesDAG: Gradient-Based Posterior Sampling for Causal Discovery
(
Contributed Talk
)
SlidesLive Video » |
🔗 |
Fri 6:30 p.m. - 6:40 p.m.
|
Generative Marginalization Models
(
Contributed Talk
)
SlidesLive Video » |
🔗 |
Fri 6:40 p.m. - 6:50 p.m.
|
Joint Bayesian Inference of Graphical Structure and Parameters with a Single Generative Flow Network
(
Contributed Talk
)
SlidesLive Video » |
🔗 |
Fri 6:50 p.m. - 7:00 p.m.
|
Closing Remark
|
🔗 |
Fri 7:00 p.m. - 8:00 p.m.
|
Poster Session 2
(
Poster Session
)
|
🔗 |
-
|
Anomaly Detection in Networks via Score-Based Generative Models
(
Poster
)
link »
Node outlier detection in attributed graphs is a challenging problem for which there is no method that would work well across different datasets. Motivated by the state-of-the-art results of score-based models in graph generative modeling, we propose to incorporate them into the aforementioned problem. Our method achieves competitive results on small-scale graphs. We provide an empirical analysis of the Dirichlet energy, and show that generative models might struggle to accurately reconstruct it. |
Dmitrii Gavrilev · Evgeny Burnaev 🔗 |
-
|
Practical and Asymptotically Exact Conditional Sampling in Diffusion Models
(
Poster
)
link »
Diffusion models have been successful on a range of conditional generation tasks including molecular design and text-to-image generation.However, these achievements have primarily depended on expensive, task-specific conditional training or error-prone heuristic approximations to them.Ideally, a conditional generation method should provide exact samples for a broad range of conditional distributions without requiring task-specific training.To this end, we introduce the Twisted Diffusion Sampler, or TDS, a sequential Monte Carlo (SMC) algorithm that targets the conditional distributions of diffusion models. The main idea is to use twisting, an SMC technique the enjoys good computational efficiency, to incorporate heuristic approximations without compromising asymptotic exactness. We study the properties of TDS on MNIST image inpainting and class-conditional generation tasks.TDS extends to Riemannian diffusion models, which are crucial for protein modeling.When applied to the motif-scaffolding problem, a core problem in protein design, TDS enables more flexible conditioning criteria than conditionally trained models, and provides state-of-the-art success rates on 9/12 problems in a benchmark set with scaffolds shorter than 100 residues. |
Brian Trippe · Luhuan Wu · Christian Naesseth · David Blei · John Cunningham 🔗 |
-
|
Generative semi-supervised learning with a neural seq2seq noisy channel
(
Poster
)
link »
We use a neural noisy channel generative model to learn the relationship between two sequences, for example text and speech, from little paired data. We identify time locality as a key assumption which is restrictive enough to support semi-supervised learning but general enough to be widely applicable. Experimentally we show that our approach is capable of recovering the relationship between written and spoken language (represented as graphemes and phonemes) from only 5 minutes of paired data. Our results pave the way for more widespread adoption of generative semi-supervised learning for seq2seq tasks. |
Soroosh Mariooryad · Matt Shannon · Siyuan Ma · Tom Bagby · David Kao · Daisy Stanton · Eric Battenberg · RJ Skerry-Ryan 🔗 |
-
|
Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation
(
Poster
)
link »
The practical utility of causality in decision-making is widely recognized, with causal discovery and inference being inherently intertwined. Nevertheless, a notable gap exists in the evaluation of causal discovery methods, where insufficient emphasis is placed on downstream inference. To address this gap, we evaluate six established baseline causal discovery methods and a newly proposed method based on GFlowNets, on the downstream task of treatment effect estimation. Through the implementation of a robust evaluation procedure, we offer valuable insights into the efficacy of these causal discovery methods for treatment effect estimation, considering both synthetic and real-world scenarios, as well as low-data scenarios. Furthermore, the results of our study demonstrate that GFlowNets possess the capability to effectively capture a wide range of useful and diverse ATE modes. |
Chris Emezue · Alexandre Drouin · Tristan Deleu · Stefan Bauer · Yoshua Bengio 🔗 |
-
|
Conditional Graph Generation with Graph Principal Flow Network
(
Poster
)
link »
Conditional graph generation is crucial and challenging since the conditional distribution of graph topology and feature is complicated and the semantic feature is hard to be captured by the generative model. In this work, we propose a novel graph conditional generative model, termed Graph Principal Flow Network (GPrinFlowNet), which enables us to progressively generate graphs from low- to high-frequency components. Our GPrinFlowNet effectively captures the subtle yet essential semantic features of graph topology, resulting in high-quality generated graph data. |
Tianze Luo · Zhanfeng Mo · Sinno Jialin Pan 🔗 |
-
|
Deep Generative Clustering with Multimodal Variational Autoencoders
(
Poster
)
link »
Multimodal VAEs have recently received significant attention as generative models for weakly-supervised learning with multiple heterogeneous modalities. In parallel, VAE-based methods have been explored as probabilistic approaches for clustering tasks. Our work lies at the intersection of these two research directions. We propose a novel multimodal VAE model, in which the latent space is extended to learn data clusters, leveraging shared information across modalities. Our experiments show that our proposed model improves generative performance over existing multimodal VAEs, particularly for unconditional generation. Furthermore, our method favourably compares to alternative clustering approaches, in weakly-supervised settings. Notably, we propose a post-hoc procedure that avoids the need for our method to have a priori knowledge of the true number of clusters, mitigating a critical limitation of previous clustering frameworks. |
Emanuele Palumbo · Sonia Laguna · Daphné Chopard · Julia Vogt 🔗 |
-
|
Graph Neural Network Powered Bayesian Optimization for Large Molecular Spaces
(
Poster
)
link »
In silico screening is an essential component of drug and materials discovery. This is challenged by the increasingly intractable size of virtual libraries and the high cost of evaluating properties. We propose GNN-SS, a Graph Neural Network-powered Bayesian Optimization (BO) algorithm as a scalable solution. GNN-SS utilizes random sub-sampling to reduce the computational complexity of the BO problem, and diversifies queries for training the model. GNN-SS is sample-efficient, and rapidly narrows the search space by leveraging the generalization ability of GNNs. Our algorithm performs competitively on the QM9 dataset and achieves state-of-the-art performance on the PMO benchmark. |
Miles Wang-Henderson · Bartu Soyuer · Parnian Kassraie · Andreas Krause · Ilija Bogunovic 🔗 |
-
|
The Pairwise Prony Algorithm: Efficient Inference of Stochastic Block Models with Prescribed Subgraph Densities
(
Poster
)
link »
We present an elegant and flexible algorithm that provides the parameters of the simplest stochastic block model (SBM) for a given set of prescribed subgraph densities, from which one can sample networks with negligible computational overhead. The method generalizes the classical method of Prony to the pairwise data of networks. The class of inferred models are at the intersection of exponential random graph models (ERGMs), which are characterized in terms of maximum entropy, and of exchangeable random graphs (i.e., graphons). We show that the required subgraph densities can be efficiently computed for both dense and sparse networks, and provide an implementation of our algorithm in python. Our method provides standardized null models for statistical analysis of network data, including for the challenging case of a single observed graph. |
Lee M Gunderson · Gecia Bravo-Hermsdorff · Peter Orbanz 🔗 |
-
|
Plug-and-Play Controllable Graph Generation with Diffusion Models
(
Poster
)
link »
Diffusion models for graph generation present transformative capabilities in generating high-quality graphs. However, controlling the properties of the generated graphs remains a challenging task for the existing methods as they mainly focus on uncontrolled graph generation from the data. To address this limitation, we propose PRODIGY (PROjected DIffusion for generating constrained Graphs), a novel approach for controllable graph generation that works with any pre-trained diffusion model. This formalizes the problem of controlled graph generation and identifies a class of constraints (e.g., edge count, valency, etc.) applicable to practical graph generation tasks. At the center of our approach is a plug-and-play sampling process, based on projection-based optimization to ensure that each generated graph satisfies the specified constraints. Experiments demonstrate the effectiveness of PRODIGY in generating high-quality and diverse graphs that satisfy the specified constraints while staying close to the training distribution. |
Kartik Sharma · Srijan Kumar · Rakshit Trivedi 🔗 |
-
|
The Local Inconsistency Resolution Algorithm
(
Poster
)
link »
We present a generic algorithm for learning and approximate inference across a broad class of statistical models, that unifies many approaches in the literature. Our algorithm, called local inconsistency resolution (LIR), has an intuitive epistemic interpretation. It is based on the theory of probabilistic dependency graphs (PDGs), an expressive class of graphical models rooted in information theory, that can capture inconsistent beliefs. |
Oliver Richardson 🔗 |
-
|
Towards Modular Learning of Deep Causal Generative Models
(
Poster
)
link »
Shpitser & Pearl (2008) proposed sound and complete algorithms to compute identifiable observational, interventional, and counterfactual queries for certain causal graph structures. However, these algorithms assume that we can correctly estimate the joint distributions, which is impractical for high-dimensional datasets. During the current rise of foundational models, we have access to large pre-trained models to generate realistic high-dimensional samples. To address the causal inference problem with high dimensional data, we propose a sequential adversarial training algorithm for learning deep causal generative models by dividing the training problem into independent sub-parts, thereby enabling the use of such pre-trained models. Our proposed algorithm called WhatIfGAN, arranges generative models according to a causal graph and trains them to imitate the underlying causal model even with unobserved confounders. Finally, with a semi-synthetic Colored MNIST dataset, we show that WhatIfGAN can sample from identifiable causal queries involving high-dimensional variables. |
Md Musfiqur Rahman · Murat Kocaoglu 🔗 |
-
|
Pretrained Language Models to Solve Graph Tasks in Natural Language
(
Poster
)
link »
Pretrained large language models (LLMs) are powerful learners in a variety of language tasks. We explore if LLMs can learn from graph-structured data when the graphs are described using natural language. We explore data augmentation and pretraining specific to the graph domain and show that LLMs such as GPT-2 and GPT-3 are promising alternatives to graph neural networks. |
Frederik Wenkel · Guy Wolf · Boris Knyazev 🔗 |
-
|
Non-Normal Diffusion Models
(
Poster
)
link »
Diffusion models generate samples by incrementally reversing a process that turns data into noise. We show that when the step size goes to zero, the reversed process is invariant to the distribution of these increments. This reveals a previously unconsidered parameter in the design of diffusion models: the distribution of the diffusion step $\boldsymbol \Delta \mathbf{x}_k = \mathbf{x}_k - \mathbf{x}_{k + 1}$. This parameter is implicitly set by default to be normally distributed in most diffusion models. By lifting this assumption, we generalize the framework for designing diffusion models and establish an expanded class of diffusion processes with greater flexibility in the choice of loss function used during training. We demonstrate the effectiveness of these models on density estimation and generative modeling tasks on standard image datasets, and show that different choices of the distribution of $\boldsymbol\Delta \mathbf{x}_k$ result in qualitatively different generated samples.
|
Henry Li 🔗 |
-
|
A Generative Model for Text Control in Minecraft
(
Poster
)
link »
Constructing AI models that respond to text instructions is challenging, especially for (multi-modal) sequential decision-making tasks. This study introduces an instruction-tuned Video Pretraining (VPT) model for Minecraft called STEVE-1, demonstrating that the unCLIP approach, utilized in DALL•E 2, is also effective for creating instruction-following sequential decision-making agents. STEVE-1 is trained in two steps: adapting the pretrained VPT model to follow commands in MineCLIP's latent space, then training a prior to predict latent codes from text. This allows us to finetune VPT through self-supervised behavioral cloning and hindsight relabeling, bypassing the need for costly human text annotations. By leveraging pretrained models like VPT and MineCLIP and employing best practices from text-conditioned image generation, STEVE-1 costs just $60 to train and can follow nearly any short-horizon open-ended text and visual task in Minecraft. We provide experimental evidence highlighting key factors for downstream performance, including pretraining, classifier-free guidance, and data scaling. All resources, including our model weights, datasets, and evaluation tools, are made available for further research. |
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith 🔗 |
-
|
Visual Chain-of-Thought Diffusion Models
(
Poster
)
link »
Recent progress with conditional image diffusion models has been stunning, and this holds true whether we are speaking about models conditioned on a text description, a scene layout, or a sketch. Unconditional image diffusion models are also improving but lag behind, as do diffusion models which are conditioned on lower-dimensional features like class labels. We propose to close the gap between conditional and unconditional models using a two-stage sampling procedure. In the first stage we sample an embedding describing the semantic content of the image. In the second stage we sample the image conditioned on this embedding and then discard the embedding. Doing so lets us leverage the power of conditional diffusion models on the unconditional generation task, which we show improves FID by 25 - 50% compared to standard unconditional generation. |
William Harvey · Frank Wood 🔗 |
-
|
Collapsed Inference for Bayesian Deep Learning
(
Oral
)
link »
Bayesian neural networks~(BNNs) provide a formalism to quantify and calibrate uncertainty in deep learning. Current inference approaches for BNNs often resort to few-sample estimation for scalability, which can harm predictive performance, while its alternatives tend to be computationally prohibitively expensive. We tackle this challenge by revealing a previously unseen connection between inference on BNNs and volume computation problems. With this observation, we introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples. It improves over a Monte-Carlo sample by limiting sampling to a subset of the network weights while pairing it with some closed-form conditional distribution over the rest. A collapsed sample represents uncountably many models drawn from the approximate posterior and thus yields higher sample efficiency. Further, we show that the marginalization of a collapsed sample can be solved analytically and efficiently despite the non-linearity of neural networks by leveraging existing volume computation solvers. Our proposed use of collapsed samples achieves a balance between scalability and accuracy. On various regression and classification tasks, our collapsed Bayesian deep learning approach demonstrates significant improvements over existing methods and sets a new state of the art in terms of uncertainty estimation and predictive performance. |
Zhe Zeng · Guy Van den Broeck 🔗 |
-
|
Your Diffusion Model is Secretly a Zero-Shot Classifier
(
Poster
)
link »
The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. However, almost all use cases so far have solely focused on sampling. In this paper, we show that the density estimates from large-scale text-to-image diffusion models like Stable Diffusion can be leveraged to perform zero-shot classification without any additional training. Our generative approach to classification, which we call Diffusion Classifier, attains strong results on a variety of benchmarks and outperforms alternative methods of extracting knowledge from diffusion models. We also find that our diffusion-based approach has stronger multimodal relational reasoning abilities than competing discriminative approaches. Finally, we use Diffusion Classifier to extract standard classifiers from class-conditional diffusion models trained on ImageNet. Even though these models are trained with weak augmentations and no regularization, they approach the performance of SOTA discriminative classifiers. Overall, our results are a step toward using generative over discriminative models for downstream tasks |
Alexander Li · Mihir Prabhudesai · Shivam Duggal · Ellis Brown · Deepak Pathak 🔗 |
-
|
Test-time Adaptation with Diffusion Models
(
Poster
)
link »
We find that generative models can be great test-time adapters for discriminative models. We propose a method to adapt pre-trained classifiers and large-scale CLIP models to individual unlabelled images by modulating the text conditioning of a text-conditional pretrained image diffusion model and maximizing the image likelihood using end-to-end backpropagation to the classifier parameters. We improve the classification accuracy of various pretrained classifiers on various datasets, including ImageNet and its variants. Further we show that our approach significantly outperforms previous test-time adaptation methods. To the best of our knowledge, this is the first work that adapts pre-trained large-scale discriminative models to individual images; all previous works require co-training under joint discriminative and self-supervised objectives, to apply at test time, which prevents them from adapting readily available models. |
Mihir Prabhudesai · Tsung-Wei Ke · Alexander Li · Deepak Pathak · Katerina Fragkiadaki 🔗 |
-
|
Beyond Confidence: Reliable Models Should Also Consider Atypicality
(
Poster
)
link »
While most machine learning models can provide confidence in their predictions, confidence is insufficient to understand a prediction's reliability. For instance, the model may have a low confidence prediction if the input is not well-represented in the training dataset or if the input is inherently ambiguous. In this work, we investigate the relationship between how atypical~(rare) a sample or a class is and the reliability of a model's predictions. We first demonstrate that atypicality is strongly related to miscalibration and accuracy. In particular, we empirically show that predictions for atypical inputs or atypical classes are more overconfident and have lower accuracy. Using these insights, we show incorporating atypicality improves uncertainty quantification and model performance for discriminative neural networks and large language models. In a case study, we show that using atypicality improves the performance of a skin lesion classifier across different skin tone groups without having access to the group attributes. Overall, \emph{we propose that models should use not only confidence but also atypicality to improve uncertainty quantification and performance}. Our results show that simple atypicality estimators already provide large benefits. |
Mert Yuksekgonul · Linjun Zhang · James Zou · Carlos Guestrin 🔗 |
-
|
Implications of kernel mismatch for OOD data
(
Poster
)
link »
Gaussian processes provide reliable uncertainty estimates in nonlinear modeling, but a poor choice of the kernel can lead to slow learning. Although learning the hyperparameters of the kernel typically leads to optimal generalization on in-distribution test data, we show that the generalization can be poor on out-of-distribution test data. We then investigate a smoothness learning method, heavier tails, and deep kernel learning as solutions, finding some evidence in favor of the first two. |
Beau Coker · Finale Doshi-Velez 🔗 |
-
|
Scaling Graphically Structured Diffusion Models
(
Poster
)
link »
Applications of the recently introduced graphically structured diffusion model (GSDM) family show that sparsifying the transformer attention mechanism within a diffusion model and meta-training on a variety of conditioning tasks can yield an efficiently learnable diffusion model artifact that is capable of flexible, in the sense of observing different subsets of variables at test-time, amortized conditioning in probabilistic graphical models. While extremely promising in terms of applicability and utility, implementations of GSDMs prior to this work were not scalable beyond toy graphical model sizes. We overcome this limitation by describing and and solving two scaling issues related to GSDMs; one engineering and one methodological. We additionally propose a new benchmark problem of weight inference for a convolutional neural network applied to $14\times14$ MNIST.
|
Christian Weilbach · William Harvey · Hamed Shirzad · Frank Wood 🔗 |
-
|
Diffusion map particle systems for generative modeling
(
Poster
)
link »
We propose a novel diffusion map particle system (DMPS) for generative modeling, based on diffusion maps and Laplacian-adjusted Wasserstein gradient descent (LAWGD). Diffusion maps are used to approximate the generator of the Langevin diffusion process from samples, and hence to learn the underlying data-generating manifold. On the other hand, LAWGD enables efficient sampling from the target distribution given a suitable choice of kernel, which we construct here via a spectral approximation of the generator, computed with diffusion maps. Our method requires no offline training and minimal tuning, and can outperform other approaches on data sets of moderate dimension. |
Fengyi Li · Youssef Marzouk 🔗 |
-
|
C-Disentanglement: Discovering Causally-Independent Generative Factors under an Inductive Bias of Confounder
(
Poster
)
link »
Representation learning assumes that real-world data is generated by a few causally disentangled generative factors (i.e., sources of variation). However, most existing works assume unconfoundedness (i.e., there are no common causes to the generative factors) in the discovery process, and thus obtain only statistical independence. In this paper, we recognize the importance of modeling confounders in discovering causal generative factors. Unfortunately, such factors are not identifiable without proper inductive bias. We fill the gap by introducing a framework named Confounded-Cisentanglement (C-Disentanglement), the first framework that explicitly introduces the inductive bias of confounder via labels/knowledge from domain expertise. We further propose an approach for sufficient identification under the VAE framework. |
Xiaoyu Liu · Jiaxin Yuan · Bang An · Yuancheng Xu · Yifan Yang · Furong Huang 🔗 |
-
|
An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets
(
Poster
)
link »
Reinforcement Learning (RL) algorithms aim to learn an optimal policy by iteratively sampling actions to learn how to maximize the total expected return, $R(x)$. GFlowNets are a special class of algorithms designed to generate diverse candidates, $x$, from a discrete set, by learning a policy that approximates the proportional sampling of $R(x)$. GFlowNets exhibit improved mode discovery compared to conventional RL algorithms, which is very useful for applications such as drug discovery and combinatorial search. However, since GFlowNets are a relatively recent class of algorithms, many techniques which are useful in RL have not yet been associated with them. In this paper, we study the utilization of a replay buffer for GFlowNets. We explore empirically various replay buffer sampling techniques and assess the impact on the speed of mode discovery and the quality of the modes discovered. Our experimental results in the Hypergrid toy domain and a molecule synthesis environment demonstrate significant improvements in mode discovery when training with a replay buffer, compared to training only with trajectories generated on-policy.
|
Nikhil Murali Vemgal · Elaine Lau · Doina Precup 🔗 |
-
|
Nonparametric posterior normalizing flows
(
Poster
)
link »
Normalizing flows allow us to describe complex probability distributions, and can be used to perform flexible maximum likelihood density estimation (Dinh et al., 2014). Such maximum likelihood density estimation is likely to overfit, particularly if the number of observations is small. Traditional Bayesian approaches offer the prospect of capturing posterior uncertainty, but come at high computational cost and do not provide an intuitive way of incorporating prior information. A nonparametric learning approach (Lyddon et al., 2018) allows us to combine observed data with priors on the space of observations. We present a scalable approximate inference algorithm for nonparametric posterior normalizing flows, and show that the resulting distributions can yield improved generalization and uncertainty quantification. |
Sinead A Williamson · Evan Ott 🔗 |
-
|
Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models
(
Poster
)
link »
Reinforcement learning presents an attractive paradigm to reason about several distinct aspects of sequential decision making, such as specifying complex goals, planning future observations and actions, and critiquing their utilities, demanding a balance between expressivity and flexible modeling for efficient learning and inference. We present Decision Stacks, a probabilistic generative framework that decomposes goal-conditioned policy agents into 3 generative modules which simulate the temporal evolution of observations, rewards, and actions. Our framework guarantees both expressivity and flexibility in designing in- dividual modules to account for key factors such as architectural bias, optimization objective and dynamics, transferability across domains, and in- ference speed. Our empirical results demonstrate the effectiveness of Decision Stacks for offline policy optimization for several MDP and POMDP environments. |
Siyan Zhao · Aditya Grover 🔗 |
-
|
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping
(
Oral
)
link »
Diffusion models have demonstrated excellent potential for generating diverse images. However, their performance often suffers from slow generation due to iterative denoising. Existing distillation methods either require significant amounts of offline computation for generating synthetic training data or need to perform expensive online learning with the help of real data. In this work, we present a novel technique called BOOT, that overcomes these limitations with an efficient data-free distillation algorithm. The core idea is to learn a time-conditioned model that predicts the output of a pre-trained diffusion model teacher given any time step. Such a model can be efficiently trained based on bootstrapping from two consecutive sampled steps. Furthermore, our method can be easily adapted to large-scale text-to-image diffusion models, which are challenging for conventional methods given the fact that the training sets are often large and difficult to access. We demonstrate the effectiveness of our approach on several benchmarks, achieving comparable generation quality while being orders ofmagnitude faster than the diffusion teacher. The text-to-image results show that BOOT is able to handle highly complex distributions, shedding light on efficient generative modeling. |
Jiatao Gu · Shuangfei Zhai · Yizhe Zhang · Lingjie Liu · Joshua M Susskind 🔗 |
-
|
Diffusion Probabilistic Models Generalize when They Fail to Memorize
(
Poster
)
link »
In this work, we study the training of diffusion probabilistic models through a series of hypotheses and carefully designed experiments. We call our key finding the memorization-generalization dichotomy, and it asserts that generalization and memorization are mutually exclusive phenomena. This contrasts with the modern wisdom of supervised learning that deep neural networks exhibit "benign" overfitting and generalize well despite overfitting the data. |
TaeHo Yoon · Joo Young Choi · Sehyun Kwon · Ernest Ryu 🔗 |
-
|
Solving Inverse Physics Problems with Score Matching
(
Poster
)
link »
We propose to solve inverse problems involving the temporal evolution of physics systems by leveraging recent advances from diffusion models. Our method moves the system's current state backward in time step by step by combining an approximate inverse physics simulator and a learned correction function. Training the learned correction with a single-step loss is equivalent to a score matching objective, while recursively predicting longer parts of the trajectory during training relates to maximum likelihood training of a corresponding probability flow.Our resulting inverse solver has excellent accuracy and temporal stability and, in contrast to other learned inverse solvers, allows for sampling the posterior of the solutions. |
Benjamin Holzschuh · Simona Vegetti · Nils Thuerey 🔗 |
-
|
Attention as Implicit Structural Inference
(
Poster
)
link »
Attention mechanisms play a crucial role in cognitive systems by allowing them to flexibly allocate cognitive resources. Transformers, in particular, have become a dominant architecture in machine learning, with attention as their central innovation. However, the underlying intuition and formalism of attention in Transformers is based on ideas of keys and queries in database management systems. In this work, we pursue a structural inference perspective, building upon, and bringing together, previous theoretical descriptions of attention such as; Gaussian Mixture Models, alignment mechanisms and Hopfield Networks. Specifically, we demonstrate that attention can be viewed as inference over an implicitly defined set of possible adjacency structures in a graphical model, revealing the generality of such a mechanism. This perspective unifies different attentional architectures in machine learning and suggests potential modifications and generalizations of attention. We hope by providing a new lens on attention architectures our work can guide the development of new and improved attentional mechanisms. |
Ryan Singh · Christopher Buckley 🔗 |
-
|
Prediction under Latent Subgroup Shifts with High-dimensional Observations
(
Poster
)
link »
We introduce a new approach to prediction in graphical models with latent-shift adaptation, i.e., where source and target environments differ in the distribution of an unobserved confounding latent variable.Previous work has shown that as long as "concept" and "proxy" variables with appropriate dependence are observed in the source environment, the latent-associated distributional changes can be identified, and target predictions adapted accurately. However, practical estimation methods do not scale well when the observations are complex and high-dimensional, even if the confounding latent is categorical.Here we build upon a recently proposed probabilistic unsupervised learning framework, the recognition-parametrised model (RPM), to recover low-dimensional, discrete latents from image observations.Applied to the problem of latent shifts, our novel form of RPM identifies causal latent structure in the source environment, and adapts properly to predict in the target.We demonstrate results in settings where predictor and proxy are high-dimensional images, a context to which previous methods fail to scale. |
William Walker · Arthur Gretton · Maneesh Sahani 🔗 |
-
|
Joint Bayesian Inference of Graphical Structure and Parameters with a Single Generative Flow Network
(
Oral
)
link »
Generative Flow Networks (GFlowNets), a class of generative models over discrete and structured sample spaces, have been previously applied to the problem of inferring the marginal posterior distribution over the directed acyclic graph (DAG) of a Bayesian Network, given observations. Based on recent advances extending this framework to non-discrete sample spaces, we propose in this paper to approximate the joint posterior over not only the structure of a Bayesian Network, but also the parameters of its conditional probability distributions. We use a single GFlowNet whose sampling policy follows a two-phase process: the DAG is first generated sequentially one edge at a time, and then the corresponding parameters are picked once the full structure is known. Since the parameters are included in the posterior distribution, this leaves more flexibility for the local probability models of the Bayesian Network, making our approach applicable even to non-linear models parametrized by neural networks. We show that our method, called JSP-GFN, offers an accurate approximation of the joint posterior, while comparing favorably against existing methods on both simulated and real data. |
Tristan Deleu · Mizu Nishikawa-Toomey · Jithendaraa Subramanian · Nikolay Malkin · Laurent Charlin · Yoshua Bengio 🔗 |
-
|
Diffusion Based Causal Representation Learning
(
Poster
)
link »
Causal reasoning can be considered a cornerstone of intelligent systems. Having access to an underlying causal graph comes with the promise of cause-effect estimation and the identification of efficient and safe interventions. However, depending on the application and the complexity of the system one causal graph might be insufficient and even the variables of interest and levels of abstractions might change. This is incompatible with currently deployed generative models including popular VAE approaches which provide only representations from a point estimate. In this work, we study recently introduced diffusion-based representations which offer access to infinite dimensional latent codes which encode different levels of information in the latent code. In a first proof of principle, we investigate the use of a single point of these infinite dimensional codes for causal representation learning and demonstrate experimentally that this approach performs comparably well in identifying the causal structure and causal variables. |
Amir Mohammad Karimi Mamaghan · Francesco Quinzan · Andrea Dittadi · Stefan Bauer 🔗 |
-
|
Provable benefits of score matching
(
Oral
)
link »
Score matching is an alternative to maximum likelihood (ML) for estimating a probability distribution parametrized up to a constant of proportionality. By fitting the ''score'' of the distribution, it sidesteps the need to compute this constant of proportionality (which is often intractable). While score matching and variants thereof are popular in practice, precise theoretical understanding of the benefits and tradeoffs with maximum likelihood---both computational and statistical---are not well understood. In this work, we give the first example of a natural exponential family of distributions such that the score matching loss is computationally efficient to optimize, and has a comparable statistical efficiency to ML, while the ML loss is intractable to optimize using a gradient-based method. The family consists of exponentials of polynomials of fixed degree, and our result can be viewed as a continuous analogue of recent developments in the discrete setting. Precisely, we show: (1) Designing a zeroth-order or first-order oracle for optimizing the maximum likelihood loss is NP-hard. (2) Maximum likelihood has a statistical efficiency polynomial in the ambient dimension and the radius of the parameters of the family. (3) Minimizing the score matching loss is both computationally and statistically efficient, with complexity polynomial in the ambient dimension. |
Chirag Pabbaraju · Dhruv Rohatgi · Anish Sevekari · Holden Lee · Ankur Moitra · Andrej Risteski 🔗 |
-
|
An Autoregressive Text-to-Graph Framework for Joint Entity and Relation Extraction
(
Poster
)
link »
In this paper, we propose a novel method for joint entity and relation extract from unstructured text by framing it as a conditional sequence generation problem. In contrast to conventional generative information extraction models that generate text as output, our approach generates a linearized graph where nodes represent text spans while the edges/relation of the graph represent relation triples. For that, our method employs a transformer encoder-decoder architecture with pointing mechanism on a dynamic vocabulary of spans and relation types. Particularly, our model can capture the structural characteristics and boundaries of entities and relations through span representation, while simultaneously grounding the generated output in the original text thanks to pointer mechanism. Evaluation on benchmark datasets validates the effectiveness of our approach, demonstrating state-of-the-art results in entity and relation extraction tasks. |
Urchade Zaratiana · Nadi Tomeh · Pierre Holat · Thierry Charnois 🔗 |
-
|
Nested Diffusion Processes for Anytime Image Generation
(
Poster
)
link »
Diffusion models are the current state-of-the-art in image generation, synthesizing high-quality images by breaking down the generation process into many fine-grained denoising steps. Despite their good performance, diffusion models are computationally expensive, requiring many neural function evaluations (NFEs). In this work, we propose an anytime diffusion-based method that can generate viable images when stopped at arbitrary times before completion. Using existing pretrained diffusion models, we show that the generation scheme can be recomposed as two nested diffusion processes, enabling fast iterative refinement of a generated image. We use this Nested Diffusion approach to peek into the generation process and enable flexible scheduling based on the instantaneous preference of the user. In experiments on ImageNet and Stable Diffusion-based text-to-image generation, we show, both qualitatively and quantitatively, that our method's intermediate generation quality greatly exceeds that of the original diffusion model, while the final slow generation result remains comparable. |
Noam Elata · Bahjat Kawar · Tomer Michaeli · Michael Elad 🔗 |
-
|
Training Diffusion Models with Reinforcement Learning
(
Poster
)
link »
Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective. However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives such as human-perceived image quality or drug effectiveness. In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for such objectives. We describe how posing denoising as a multi-step decision-making problem enables a class of policy gradient algorithms, which we refer to as denoising diffusion policy optimization (DDPO), that are more effective than alternative reward-weighted likelihood approaches. Empirically, DDPO is able to adapt text-to-image diffusion models to objectives that are difficult to express via prompting, such as image compressibility, and those derived from human feedback, such as aesthetic quality. Finally, we show that DDPO can improve prompt-image alignment using feedback from a vision-language model without the need for additional data collection or human annotation. |
Kevin Black · Michael Janner · Yilun Du · Ilya Kostrikov · Sergey Levine 🔗 |
-
|
Generating Turn-Based Player Behavior via Experience from Demonstrations
(
Poster
)
link »
Turn-based sports, such as badminton and tennis, present challenges for imitating human player behaviors from offline datasets in sports analytics. We propose RallyNet, a novel hierarchical offline imitation learning model for turn-based player behaviors. RallyNet captures players' decision dependencies by modeling decision-making processes in turn-based sports as a contextual Markov decision process (CMDP). It leverages experience to generate contexts that aid decision-making, reducing errors. Additionally, RallyNet models player interactions using a latent geometric Brownian motion, enhancing realism and introducing helpful inductive bias. Experimental results on a real-world badminton game dataset demonstrate the effectiveness of RallyNet, outperforming prior offline imitation learning approaches and a state-of-the-art turn-based supervised method. |
Kuang-Da Wang · Wei-Yao Wang · Ping-Chun Hsieh · Wen-Chih Peng 🔗 |
-
|
Automatic Rao-Blackwellization for Sequential Monte Carlo with Belief Propagation
(
Poster
)
link »
Exact Bayesian inference on state-space models (SSM) is in general untractable and, unfortunately, basic Sequential Monte Carlo (SMC) methods do not yield correct approximations for complex models. In this paper, we propose a mixed inference algorithm that computes closed-form solutions using Belief Propagation as much as possible, and falls back to sampling-based SMC methods when exact computations fail. This algorithm thus implements automatic Rao-Blackwellization and is even exact for Gaussian tree models. |
Waïss Azizian · Guillaume Baudart · Marc Lelarge 🔗 |
-
|
Collaborative Score Distillation for Consistent Visual Synthesis
(
Poster
)
link »
Generative priors of large-scale text-to-image diffusion models enable a wide range of new generation and editing applications on diverse visual modalities. However, when adapting these priors to complex visual modalities, often represented as multiple images (e.g., video), achieving consistency across a set of images is challenging. In this paper, we address this challenge with a novel method, Collaborative Score Distillation (CSD). CSD is based on the Stein Variational Gradient Descent (SVGD). Specifically, we propose to consider multiple samples as “particles” in the SVGD update and combine their score functions to distill generative priors over a set of images synchronously. Thus, CSD facilitates seamless integration of information across 2D images, leading to a consistent visual synthesis across multiple samples. We show the effectiveness of CSD in a variety of tasks, encompassing the visual editing of panorama images, videos, and 3D scenes. Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models. |
Subin Kim · Kyungmin Lee · June Suk Choi · Jongheon Jeong · Kihyuk Sohn · Jinwoo Shin 🔗 |
-
|
Exploring Exchangeable Dataset Amortization for Bayesian Posterior Inference
(
Poster
)
link »
Bayesian inference provides a natural way of incorporating uncertainties and different underlying theories when making predictions or analyzing complex systems. However, it requires computationally expensive routines for approximation, which have to be re-run when new data is observed and are thus infeasible to efficiently scale and reuse. In this work, we look at the problem from the perspective of amortized inference to obtain posterior parameter distributions for known probabilistic models. We propose a neural network-based approach that can handle exchangeable observations and amortize over datasets to convert the problem of Bayesian posterior inference into a single forward pass of a network. Our empirical analyses explore various design choices for amortized inference by comparing: (a) our proposed variational objective with forward KL minimization, (b) permutation-invariant architectures like Transformers and DeepSets, and (c) parameterizations of posterior families like diagonal Gaussian and Normalizing Flows. Through our experiments, we successfully apply amortization techniques to estimate the posterior distributions for different domains solely through inference. |
Sarthak Mittal · Niels Bracher · Guillaume Lajoie · Priyank Jaini · Marcus Brubaker 🔗 |
-
|
Function Space Bayesian Pseudocoreset for Bayesian Neural Networks
(
Poster
)
link »
A Bayesian pseudocoreset is a compact synthetic dataset summarizing essential information of a large-scale dataset and thus can be used as a proxy dataset for scalable Bayesian inference. Typically, a Bayesian pseudocoreset is constructed by minimizing a divergence measure between the posterior conditioning on the pseudocoreset and the posterior conditioning on the full dataset. However, evaluating the divergence can be challenging, particularly for the models like deep neural networks having high-dimensional parameters. In this paper, we propose a novel Bayesian pseudocoreset construction method that operates on a function space. Unlike previous methods, which construct and match the coreset and full data posteriors in the space of model parameters (weights), our method constructs variational approximations to the coreset posterior on a function space and matches it to the full data posterior in the function space. By working directly on the function space, our method could bypass several challenges that may arise when working on a weight space, including limited scalability and multi-modality issue. |
Balhae Kim · Hyungi Lee · Juho Lee 🔗 |
-
|
Beyond Intuition, a Framework for Applying GPs to Real-World Data
(
Poster
)
link »
Gaussian Processes (GPs) offer an attractive method for regression over small, structured and correlated datasets. However, their deployment is hindered by computational costs and limited guidelines on how to apply GPs beyond simple low-dimensional datasets. We propose a framework to identify the suitability of GPs to a given problem and how to set up a robust and well-specified GP model. The guidelines formalise the decisions of experienced GP practitioners, with an emphasis on kernel design and scaling options. The framework is then applied to a case study of glacier elevation change and yields more accurate results at test time. |
Kenza Tazi · Jihao Andreas Lin · ST John · Hong Ge · Richard E Turner · Ross Viljoen · Alex Gardner 🔗 |
-
|
Identifying Under-Reported Events in Networks with Spatial Latent Variable Models
(
Poster
)
link »
Decision-makers often observe the occurrence of events through a reporting process. City governments, for example, rely on resident reports to register and then resolve urban infrastructural problems such as fallen street trees, over-flooding sewers, or rat infestations. In the absence of additional assumptions, events that occur but are not reported cannot be distinguished from events that truly did not occur, leading to systematic neglect in addressing problems in neighborhoods that comparatively under-report events. In this paper, we leverage a Bayesian model to describe this setting in the presence of network correlations in the event occurrence process. We present a sampling routine to estimate the report rates and the event occurrence incidence, as well as infer the ground truth of discrete latent states. We apply the model to flooding reports in New York City, publicly available via the 311 data portal. |
Gabriel Agostini · Emma Pierson · Nikhil Garg 🔗 |
-
|
Generative Marginalization Models
(
Oral
)
link »
We introduce marginalization models, a new family of generative model for high-dimensional discrete data. They offer scalable and flexible generative modeling with tractable likelihoods through explicit modeling of all induced marginal distributions. Marginalization models enable fast evaluation of arbitrary marginal probabilities with a single forward pass of the neural network, which overcomes a major limitation of methods with exact marginal inference such as autoregressive models (ARMs). They also support scalable training for any-order generative modeling that previous methods fail to achieve under the setting of distribution matching to a given desired probability (specified by an unnormalized probability function such as energy function or reward function). We demonstrate the effectiveness of the proposed model on a variety of discrete data distributions, including binary images, language, physical systems, and molecules, on both likelihood maximization and distribution matching tasks. Marginalization models achieve orders of magnitude speedup in evaluation of the probability mass function. For distribution matching, marginalization models enable scalable training of any-order generative models that previous methods fail to achieve. |
Sulin Liu · Peter Ramadge · Ryan P. Adams 🔗 |
-
|
Early Exiting for Accelerated Inference in Diffusion Models
(
Poster
)
link »
Diffusion models have achieved impressive results in generating content across domains like images, videos, text, and audio. However, their sampling speed is a practical challenge due to repeated evaluation of score estimation networks during inference. To address this, we propose a novel framework that optimizes compute allocation for score estimation, reducing overall sampling time. Our key insight is that the computation required for score estimation varies at different time steps. Based on this observation, we introduce an early-exiting scheme that selectively skips the subset of parameters in the score estimation network during the inference, guided by a time-dependent exit schedule. We apply this technique to image synthesis with diffusion models and demonstrate significantly improved sampling throughput without compromising image quality. Moreover, our approach seamlessly integrates with various types of solvers for faster sampling, leveraging their compatibility to enhance overall efficiency. |
Taehong Moon · Moonseok Choi · EungGu Yun · Jongmin Yoon · Gayoung Lee · Juho Lee 🔗 |
-
|
MissDiff: Training Diffusion Models on Tabular Data with Missing Values
(
Poster
)
link »
Diffusion models have shown remarkable performance in modeling data distributions and synthesizing data. The vanilla diffusion model typically requires complete or fully observed training data, while incomplete data is a common issue in various real-world applications, particularly in tabular data. This work presents a unified and principled diffusion-based framework for learning from data with missing values under various missing mechanisms. We first observe that the widely adopted "impute-then-generate" pipeline may lead to a biased learning objective. Then we propose to mask the regression loss of Denoising Score Matching in the training phase. We show that the proposed method is consistent in learning the score of data distributions, and the training objective serves as an upper bound for the negative likelihood in certain cases. The proposed framework is evaluated on multiple tabular datasets using realistic and efficacious metrics. It is demonstrated to outperform several baseline methods by a large margin. |
Yidong Ouyang · Liyan Xie · Chongxuan Li · Guang Cheng 🔗 |
-
|
Empirically Validating Conformal Prediction on Modern Vision Architectures Under Distribution Shift and Long-tailed Data
(
Poster
)
link »
Conformal prediction has emerged as a rigorous means of providing deep learning models with reliable uncertainty estimates and safety guarantees. Yet, its performance is known to degrade under distribution shift and long-tailed class distributions, which are often present in real world applications. Here, we characterize the performance of several post-hoc and training-based conformal prediction methods under these settings, providing the first empirical evaluation on large-scale datasets and models. We show that across numerous conformal methods and neural network families, performance greatly degrades under distribution shifts violating safety guarantees. Similarly, we show that in long-tailed settings the guarantees are frequently violated on many classes. Understanding the limitations of these methods is necessary for deployment in real world and safety-critical applications. |
Kevin Kasa · Graham Taylor 🔗 |
-
|
CM-GAN: Stabilizing GAN Training with Consistency Models
(
Poster
)
link »
In recent years, generative adversarial networks (GANs) have gained attention for their ability to generate realistic images, despite being notoriously difficult to train. On the other hand, diffusion models have emerged as a promising alternative, offering stable training processes and avoiding mode collapse issues; however, their generation process is computationally expensive. To overcome this problem, Song et al. (2023) proposed consistency models (CMs) that are optimized through a novel consistency constraint induced by the underlying diffusion process. In this paper, we show that the same consistency constraint can be used to stabilize the training of GANs and alleviate the notorious mode collapse problem. In this way, we provide a method to combine the main strengths of diffusions and GANs while mitigating their major drawbacks. Additionally, as the technique can also be viewed as a method to fine-tune the consistency models using a discriminator, its performance is expected to outperform CM in general. We provide preliminary empirical results on MNIST to corroborate our claims. |
Haoye Lu · Yiwei Lu · Dihong Jiang · Spencer Szabados · Sun Sun · Yaoliang Yu 🔗 |
-
|
Flow Matching for Scalable Simulation-Based Inference
(
Poster
)
link »
Neural posterior estimation methods based on discrete normalizing flows have become established tools for simulation-based inference (SBI), but scaling them to high-dimensional problems can be challenging. Building on recent advances in generative modeling, we here present flow matching posterior estimation (FMPE), a technique for SBI using continuous normalizing flows. Like diffusion models, and in contrast to discrete flows, flow matching allows for unconstrained architectures, providing enhanced flexibility for complex data modalities. Flow matching, therefore, enables exact density evaluation, fast training, and seamless scalability to large architectures---making it ideal for SBI. We show that FMPE achieves competitive performance on an established SBI benchmark, and then demonstrate its improved scalability on a challenging scientific problem: for gravitational-wave inference, FMPE outperforms methods based on comparable discrete flows, reducing training time by 30\% with substantially improved accuracy. Our work underscores the potential of FMPE to enhance performance in challenging inference scenarios, thereby paving the way for more advanced applications to scientific problems. |
Jonas Wildberger · Maximilian Dax · Simon Buchholz · Stephen R. Green · Jakob Macke · Bernhard Schölkopf 🔗 |
-
|
Learning Linear Causal Representations from Interventions under General Nonlinear Mixing
(
Poster
)
link »
We study the problem of learning causal representations from unknown, latent interventions in a general setting, where the latent distribution is Gaussian but the mixing function is completely general. We prove strong identifiability results given unknown single-node interventions, i.e., without having access to the intervention targets. This generalizes prior works which have focused on weaker classes, such as linear maps or paired counterfactual data. This is also the first instance of causal identifiability from non-paired interventions for deep neural network embeddings. Our proof relies on carefully uncovering the high-dimensional geometric structure present in the data distribution after a non-linear density transformation, which we capture by analyzing quadratic forms of precision matrices of the latent distributions. Finally, we propose a contrastive algorithm to identify the latent variables in practice and evaluate its performance on various tasks. |
Simon Buchholz · Goutham Rajendran · Elan Rosenfeld · Bryon Aragam · Bernhard Schölkopf · Pradeep Ravikumar 🔗 |
-
|
Identifiability of Discretized Latent Coordinate Systems via Density Landmarks Detection
(
Poster
)
link »
Disentanglement aims to recover meaningful latent ground-truth factors from only the observed distribution. Identifiability provides the theoretical grounding for disentanglement to be well-founded. Unfortunately, unsupervised identifiability of independent latent factors is a theoretically proven impossibility in the i.i.d. setting under a general nonlinear smooth map from factors to observations. In this work, we show that, remarkably, it is possible to recover discretized latent coordinates under the most general smooth mapping (diffeomorphism) without any additional inductive bias on the mapping. This is, provided the latent density has axis-aligned discontinuity landmarks, but without making the unrealistic assumption of statistical independence of the factors. We introduce this novel form of identifiability and provide a comprehensive proof of the recovery of discretized coordinates. |
Vitória Barin-Pacela · Kartik Ahuja · Simon Lacoste-Julien · Pascal Vincent 🔗 |
-
|
Causal Discovery with Language Models as Imperfect Experts
(
Poster
)
link »
Understanding the causal relationships that underlie a system is a fundamental prerequisite to accurate decision-making. In this work, we explore how expert knowledge can be used to improve the data-driven identification of causal graphs, beyond Markov equivalence classes. In doing so, we consider a setting where we can query an expert about the orientation of causal relationships between variables, but where the expert may provide erroneous information. We propose strategies for amending such expert knowledge based on consistency properties, e.g., acyclicity and conditional independencies in the equivalence class. We then report a case study, on real data, where a large language model is used as an imperfect expert. |
Stephanie Long · Alex Piche · Valentina Zantedeschi · Tibor Schuster · Alexandre Drouin 🔗 |
-
|
HiGen: Hierarchical Graph Generative Networks
(
Poster
)
link »
Most real-world graphs exhibit a hierarchical structure, which is often overlooked by existing graph generation methods. In his work, we introduce HiGen, a Hierarchical Graph Generative Networkto address the limitations of existing generative models by incorporating community structures and cross-level interactions. This approach involves generating graphs in a coarse-to-fine manner, where graph generation at each level is conditioned on a higher level (lower resolution) graph. The generation of communities at lower levels is performed in parallel, followed by the prediction of cross-edges between communities using a separate model. This parallelized approach enables high scalability.To capture hierarchical relations, our model allows each node at a given level to depend not only on its neighbouring nodes but also on its corresponding super-node at the higher level. Furthermore, we address the generation of integer-valued edge weights of the hierarchical structure by modeling the output distribution of edges using a multinomial distribution. We show that multinomial distribution can be factorized successively, enabling the autoregressive generation of each community.This property makes the proposed architecture well-suited for generating graphs with integer-valued edge weights.Furthermore, by breaking down the graph generation process into the generation of multiple small partitions that are conditionally independent of each other, HiGen reduces its sensitivity to a predefined initial ordering of nodes. Empirical studies demonstrate that the proposed generative model captures both local and global properties of graphs and achieves state-of-the-art performance in terms of graph quality on various benchmark graph datasets. |
Mahdi Karami 🔗 |
-
|
Fast and Functional structured data generator
(
Poster
)
link »
In this study, we address the challenge of using energy-based models to produce high-quality, label-specific data in complex structured datasets. Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing, which affects the diversity of synthetic data and increases generation times. To address these issues, we use a novel training algorithm that exploits non-equilibrium MCMC effects. This approach improves the model's ability to correctly classify samples and generate high-quality samples in only a few sampling steps. The effectiveness of this method is demonstrated learning three datasets with Restricted Boltzmann Machines: handwritten digits for visualization, a human mutation genome dataset classified by continental origin, and sequences of an enzyme protein family categorized by experimental biological function. |
Alessandra Carbone · Aurélien Decelle · Lorenzo Rosset · Beatriz Seoane 🔗 |
-
|
Structured Neural Networks for Density Estimation
(
Poster
)
link »
Given prior knowledge on the conditional independence structure of observed variables, often in the form of Bayesian networks or directed acyclic graphs, it is beneficial to encode such structure into neural networks during learning. This is particularly advantageous in tasks such as density estimation and generative modelling when data is scarce. We propose the Structured Neural Network (StrNN), which masks specific pathways in a neural network. The masks are designed via a novel relationship we explore between neural network architectures and binary matrix factorization, to ensure that the desired conditional independencies are respected and predefined objectives are explicitly optimized. We devise and study practical algorithms for this otherwise NP-hard design problem. We demonstrate the utility of StrNN in by applying StrNN to binary and Gaussian density estimation tasks. Our work opens up new avenues for applications such as data-efficient generative modeling with autoregressive flows and causal inference. |
Asic Chen · Ruian Shi · Xiang Gao · Ricardo Baptista · Rahul G. Krishnan 🔗 |
-
|
Diffusion Generative Inverse Design
(
Poster
)
link »
Inverse design refers to the problem of optimizing the input of an objective function in order to enact a target outcome. For many real-world engineering problems, the objective function takes the form of a simulator that predicts how the system state will evolve over time, and the design challenge is to optimize the initial conditions that lead to a target outcome. Recent developments in learned simulation have shown that graph neural networks (GNNs) can be used for accurate, efficient, differentiable estimation of simulator dynamics, and support high-quality design optimization with gradient- or sampling-based optimization procedures. However, optimizing designs from scratch requires many expensive model queries, and these procedures exhibit basic failures on either non-convex or high-dimensional problems.In this work, we show how denoising diffusion models (DDMs) can be used to solve inverse design problems efficiently and propose a particle sampling algorithm for further improving their efficiency. Experimentally this approach substantially reduces the number of calls to the simulator compared to standard techniques. |
Marin Vlastelica · Tatiana Lopez-Guevara · Kelsey Allen · Peter Battaglia · Arnaud Doucet · Kimberly Stachenfeld 🔗 |
-
|
Tree Variational Autoencoders
(
Poster
)
link »
We propose a new generative hierarchical clustering model that learns a flexible tree-based posterior distribution over latent variables. The proposed Tree Variational Autoencoder (TreeVAE) hierarchically divides samples according to their intrinsic characteristics, shedding light on hidden structures in the data. It adapts its architecture to discover the optimal tree for encoding dependencies between latent variables, improving generative performance. We show that TreeVAE uncovers underlying clusters in the data and finds meaningful hierarchical relations between the different groups on several datasets. Due to its generative nature, TreeVAE can generate new samples from the discovered clusters via conditional sampling. |
Laura Manduchi · Moritz Vandenhirtz · Alain Ryser · Julia Vogt 🔗 |
-
|
Morse Neural Networks for Uncertainty Quantification
(
Poster
)
link »
We introduce a new deep generative model useful for uncertainty quantification: the Morse neural network, which generalizes the unnormalized Gaussian densities to have modes of high-dimensional submanifolds instead of just discrete points. Fitting the Morse neural network via a KL-divergence loss yields 1) a (unnormalized) generative density, 2) an OOD detector, 3) a calibration temperature, 4) a generative sampler, along with in the supervised case 5) a distance aware-classifier. The Morse network can be used on top of a pre-trained network to bring distance-aware calibration w.r.t the training data. Because of its versatility, the Morse neural networks unifies many techniques: e.g., the Entropic Out-of-Distribution Detector of (Macêdo et al., 2021) in OOD detection, the one class Deep Support Vector Description method of (Ruff et al., 2018) in anomaly detection, or the Contrastive One Class classifier in continuous learning (Sun et al., 2021).The Morse neural network has connections to sup-port vector machines, kernel methods, and Morse theory in topology. |
Benoit Dherin · Huiyi Hu · JIE REN · Michael Dusenberry · Balaji Lakshminarayanan 🔗 |
-
|
STable Permutation-based Framework for Table Generation in Sequence-to-Sequence Models
(
Poster
)
link »
We present a permutation-based text-to-table neural framework that unifies diverse NLP tasks into table outputs. The framework uses a probabilistic approach during training, maximizing the expected log-likelihood across all random permutations of table content factorization. At the inference stage, we optimize model uncertainties and minimize error propagation by leveraging the model's ability to generate cells in any order. Our method accelerates inference by up to 4$\times$ on some datasets and improves text-to-table performance by up to 15\% over previous solutions, all while preserving output quality.
|
Michał Pietruszka · Michał Turski · Łukasz Borchmann · Tomasz Dwojak · Gabriela Pałka · Karolina Szyndler · Dawid Jurkiewicz · Łukasz Garncarek 🔗 |
-
|
Augmenting Control over Exploration Space in Molecular Dynamics Simulators to Streamline De Novo Analysis through Generative Control Policies
(
Poster
)
link »
This study introduces the P5 model - a groundbreaking method that utilizes reinforcement learning (RL) to augment control, effectiveness, and scalability in molecular dynamics simulations (MD). Our innovative strategy optimizes the sampling of target polymer chain conformations, marking an efficiency improvement of over 37.1%. The RL-induced control policies function as an inductive bias, modulating Brownian forces to steer the system towards the preferred state, thereby expanding the exploration of the configuration space beyond what traditional MD allows. This broadened exploration generates a more varied set of conformations and targets specific properties, a feature pivotal for progress in polymer development, drug discovery, and material design. Our technique offers significant advantages when investigating new systems with limited prior knowledge, opening up new methodologies for tackling complex simulation problems with generative techniques. |
Paloma Gonzalez-Rojas · Gregory Rutledge 🔗 |
-
|
Neuro-Causal Factor Analysis
(
Poster
)
link »
We revisit nonlinear factor analysis from a comparatively new perspective given by advancements in causal discovery and deep learning, introducing a framework for Neuro-Causal Factor Analysis (NCFA). Our approach is fully nonparametric: It identifies factors via latent causal discovery methods and then uses a variational autoencoder (VAE) that is constrained to abide by the Markov factorization of the distribution with respect to the learned graph. We evaluate NCFA on real and synthetic data sets, finding that it performs comparably to standard VAEs on data reconstruction tasks but with the advantages of sparser architecture, lower model complexity, and causal interpretability. Unlike traditional factor analysis methods, our NCFA method allows learning and reasoning about the latent factors underlying observed data from a justifiably causal perspective, even when the relations between factors and measurements are highly nonlinear. |
Alex Markham · Mingyu Liu · Bryon Aragam · Liam Solus 🔗 |
-
|
Autoregressive Diffusion Models with non-Uniform Generation Order
(
Poster
)
link »
Diffusion models for discrete data have gained increasing interest lately. Recent methods use an autoregressive formulation, but where the generation order is random. In this work, we turn our attention to the distribution of the generation order. Instead of using a uniform distribution over all possible orders, we propose to limit the distribution for facilitating learning the generative model, while still keeping the benefit of not having to rely on a fixed generation order. We empirically show how limiting the generation order can improve the generative performance in generating molecular graphs. |
Filip Ekström Kelvinius · Fredrik Lindsten 🔗 |
-
|
Variational Point Encoding Deformation for Dental Modeling
(
Poster
)
link »
We introduce VF-Net, a probabilistic extension of FoldingNet, for learning representations of point cloud data. VF-Net overcomes the limitations of existing models by incorporating a 1-to-1 mapping between input and output points. By eliminating the need for Chamfer distance optimization, this approach enables the development of a fully probabilistic model. We demonstrate that VF-Net outperforms other models in dental reconstruction tasks, including shape completion and tooth wear simulation. The learned latent representations exhibit robustness and enable meaningful interpolation between dental scans. |
Johan Ye · Thomas Ørkild · Peter Søndergard · Søren Hauberg 🔗 |
-
|
BayesDAG: Gradient-Based Posterior Sampling for Causal Discovery
(
Oral
)
link »
Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. In this work, we introduce a scalable Bayesian causal discovery framework based on stochastic gradient Markov Chain Monte Carlo (SG-MCMC) that directly samples DAGs from the posterior without any DAG regularization, simultaneously draws function parameter samples and is applicable to both linear and nonlinear causal models. To enable our approach, we derive a novel equivalence to the permutation-based DAG learning, which opens up possibilities of using any relaxed gradient estimator defined over permutations. To our knowledge, this is the first framework applying gradient-based MCMC sampling for causal discovery. Empirical evaluations on synthetic and real-world datasets demonstrate our approach's effectiveness compared to state-of-the-art baselines. |
Yashas Annadani · Nick Pawlowski · Joel Jennings · Stefan Bauer · Cheng Zhang · Wenbo Gong 🔗 |
-
|
On the Equivalence of Consistency-Type Models: Consistency Models, Consistent Diffusion Models, and Fokker-Planck Regularization
(
Poster
)
link »
The emergence of various notions of "consistency" in diffusion models has garnered considerable attention and helped achieve improved sample quality, likelihood estimation, and accelerated sampling. Although similar concepts have been proposed in the literature, the precise relationships among them remain unclear. In this study, we establish theoretical connections between three recent "consistency" notions designed to enhance diffusion models for distinct objectives. Our insights offer the potential for a more comprehensive and encompassing framework for consistency-type models. |
Chieh-Hsin Lai · Yuhta Takida · Toshimitsu Uesaka · Naoki Murata · Yuki Mitsufuji · Stefano Ermon 🔗 |
-
|
Uncovering Latent Structure Using Random Partition Models
(
Poster
)
link »
Partitioning a set of elements into an unknown number of mutually exclusive subsets is essential in many machine learning problems.However, assigning elements, such as samples in a dataset or neurons in a network layer, to an unknown and discrete number of subsets is inherently non-differentiable, prohibiting end-to-end gradient-based optimization of parameters.We overcome this limitation by proposing a novel two-step method for inferring partitions, which allows its usage in variational inference tasks.This new approach enables reparameterized gradients with respect to the parameters of the new random partition model.Our method works by inferring the number of elements per subset and, second, by filling these subsets in a learned order.We highlight the versatility of our general-purpose approach on two different challenging experiments: variational clustering and inference of shared and independent generative factors under weak supervision. |
Thomas Sutter · Alain Ryser · Joram Liebeskind · Julia Vogt 🔗 |
-
|
Improving Training of Likelihood-based Generative Models with Gaussian Homotopy
(
Poster
)
link »
Generative Models (GMs) have become popular for their success in various domains. In computer vision, for instance, they are able to generate astonishing realistic-looking images. Likelihood-based GMs are fast at generating new samples, given that they need a single model evaluation per sample, but their sample quality is usually lower than score-based Diffusion Models (DMs). In this work, we verify that the success of score-based DMs is in part due to the process of data smoothing by incorporating this in the training of likelihood-based GMs. In the literature of optimization, this process of data smoothing is referred to as Gaussian homotopy (GH), and it has strong theoretical grounding. Crucially, GH does not incur computational overheads, and it can be implemented by adding one line of code in the optimization loop. Results on image datasets, including Variational Autoencoders and Normalizing Flows, demonstrate significant improvements in generation quality of likelihood-based GMs. |
Ba-Hien Tran · Giulio Franzese · Pietro Michiardi · Maurizio Filippone 🔗 |
-
|
Fit Like You Sample: Sample-Efficient Generalized Score Matching from Fast Mixing Markov Chains
(
Poster
)
link »
Score matching is an approach to learning probability distributions parametrized up to a constant of proportionality (e.g. EBMs). The idea is to fit the score of the distribution (i.e. $\nabla_x \log p(x)$), rather than the likelihood, thus avoiding the need to evaluate the constant of proportionality. While there's a clear algorithmic benefit, the statistical "cost" can be steep: recent work by Koehler et al '23 showed that for distributions that have poor isoperimetric properties (a large Poincare or log-Sobolev constant), score matching is substantially statistically less efficient than maximum likelihood. However, many natural realistic distributions, e.g. multimodal distributions as simple as a mixture of two Gaussians---even in one dimension---have a poor Poincare constant.In this paper, we show a close connection between the mixing time of an arbitrary Markov process with generator $\mathcal{L}$ and a generalized score matching loss that tries to fit $\frac{\mathcal{O}p}{p}$. We instantiate this framework with several examples. In the special case of $\mathcal{O} = \nabla_x$, and $\mathcal{L}$ being the generator of Langevin diffusion, this generalizes and recovers the results from Koehler et al '23. If $\mathcal{L}$ corresponds to a Markov process corresponding to a continuous version of simulated tempering, we show the corresponding generalized score matching loss is a Gaussian-convolution annealed score matching loss, akin to the one proposed in Song-Ermon '19. Moreover, we show that if the distribution being learned is a mixture of $K$ Gaussians in $d$ dimensions, the sample complexity of annealed score matching is polynomial in $d$ and $K$ --- obviating the Poincar'e constant-based lower bounds of the basic score matching loss shown in Koehler et al. This is the first result characterizing the benefits of annealing for score matching---a crucial component in more sophisticated score-based approaches like Song-Ermon '19.
|
Yilong Qin · Andrej Risteski 🔗 |
-
|
Lexinvariant Language Models
(
Poster
)
link »
Token embeddings, a mapping from discrete lexical symbols to continuous vectors, are at the heart of any language model (LM). However, lexical symbol meanings can also be determined and even redefined by their structural role in a long context. In this paper, we ask: is it possible for a language model to be performant without \emph{any} fixed token embeddings? Such a language model would have to rely entirely on the co-occurence and repetition of tokens in the context rather than the \textit{a priori} identity of any token. To answer this, we study \textit{lexinvariant}language models that are invariant to lexical symbols and therefore do not need fixed token embeddings in practice. First, we prove that we can construct a lexinvariant LM to converge to the true language model at a uniform rate that is polynomial in terms of the context length, with a constant factor that is sublinear in the vocabulary size. Second, to build a lexinvariant LM, we simply encode tokens using random Gaussian vectors, such that each token maps to the same representation within each sequence but different representations across sequences. Empirically, we demonstrate that it can indeed attain perplexity comparable to that of a standard language model, given a sufficiently long context. We further explore two properties of the lexinvariant language models: First, given text generated from a substitution cipher of English, it implicitly implements Bayesian in-context deciphering and infers the mapping to the underlying real tokens with high accuracy. Second, it has on average 4X better accuracy over synthetic in-context reasoning tasks. Finally, we discuss regularizing standard language models towards lexinvariance and potential practical applications. |
Qian Huang · Eric Zelikman · Sarah Chen · Yuhuai Wu · Greg Valiant · Percy Liang 🔗 |
-
|
BatchGFN: Generative Flow Networks for Batch Active Learning
(
Poster
)
link »
We introduce BatchGFN—a novel approach for pool-based active learning that uses generative flow networks to sample sets of data points proportional to a batch reward. With an appropriate reward function to quantify the utility of acquiring a batch, such as the joint mutual information between the batch and the model parameters, BatchGFN is able to construct highly informative batches for active learning. We show our approach enables sampling near-optimal utility batches at inference time with a single forward pass per point in the batch in toy regression problems. This alleviates the computational complexity of batch-aware algorithms and removes the need for greedy approximations to find maximizers for the batch reward. We also present early results for amortizing training across acquisition steps, which will enable scaling to real-world tasks. |
Shreshth Malik · Salem Lahlou · Andrew Jesson · Moksh Jain · Nikolay Malkin · Tristan Deleu · Yoshua Bengio · Yarin Gal 🔗 |
-
|
Balanced Training of Energy-Based Models with Adaptive Flow Sampling
(
Poster
)
link »
Energy-based models (EBMs) are versatile density estimation models that directly parameterize an unnormalized log density. Although very flexible, EBMs lack a specified normalization constant of the model, making the likelihood of the model computationally intractable. Several approximate samplers and variational inference techniques have been proposed to estimate the likelihood gradients for training. These techniques have shown promising results in generating samples, but little attention has been paid to the statistical accuracy of the estimated density, such as determining the relative importance of different classes in a dataset. In this work, we propose a new maximum likelihood training algorithm for EBMs that uses a different type of generative model, normalizing flows (NF), which have recently been proposed to facilitate sampling. Our method fits an NF to an EBM during training so that an NF-assisted sampling scheme provides an accurate gradient for the EBMs at all times, ultimately leading to a fast sampler for generating new data. |
Louis Grenioux · Eric Moulines · Marylou Gabrié 🔗 |
-
|
PRODIGY: Enabling In-context Learning Over Graphs
(
Poster
)
link »
In-context learning is the ability of a pretrained model to adapt to novel and diverse downstream tasks by conditioning on prompt examples, without optimizing any parameters. While large language models have demonstrated this ability, how in-context learning could be performed over graphs is unexplored. In this paper, we develop Pretraining Over Diverse In-Context Graph Systems (PRODIGY), the first pretraining framework that enables in-context learning over graphs. The key idea of our framework is to formulate in-context learning over graphs with a novel \emph{prompt graph} representation, which connects prompt examples and queries. We then propose a graph neural network architecture over the prompt graph and a corresponding family of in-context pretraining objectives. With PRODIGY, the pretrained model can directly perform novel downstream classification tasks on unseen graphs via in-context learning. We provide empirical evidence of the effectiveness of our framework by showcasing its strong in-context learning performance on tasks involving citation networks and knowledge graphs. Our approach outperforms the in-context learning accuracy of contrastive pretraining baselines with hard-coded adaptation by 18\% on average across all setups. Moreover, it also outperforms standard finetuning with limited data by 33\% on average with in-context learning. |
Qian Huang · Hongyu Ren · Peng Chen · Gregor Kržmanc · Daniel Zeng · Percy Liang · Jure Leskovec 🔗 |
-
|
Dimensionality Reduction as Probabilistic Inference
(
Poster
)
link »
Dimensionality reduction (DR) algorithms compress high-dimensional data into a lower dimensional representation while preserving important features of the data. DR is a critical step in many analysis pipelines as it enables visualisation, noise reduction and efficient downstream processing of the data. In this work, we introduce the ProbDR variational framework, which interprets a wide range of classical DR algorithms as probabilistic inference algorithms in this framework. ProbDR encompasses PCA, CMDS, LLE, LE, MVU, diffusion maps, kPCA, Isomap, (t-)SNE, and UMAP. In our framework, a low-dimensional latent variable is used to construct a covariance, precision, or a graph Laplacian matrix, which can be used as part of a generative model for the data. Inference is done by optimizing an evidence lower bound. We demonstrate the internal consistency of our framework and show that it enables the use of probabilistic programming languages (PPLs) for DR. Additionally, we illustrate that the framework facilitates reasoning about unseen data and argue that our generative models approximate Gaussian processes (GPs) on manifolds. By providing a unified view of DR, our framework facilitates communication, reasoning about uncertainties, model composition, and extensions, particularly when domain knowledge is present. |
Aditya Ravuri · Francisco Vargas · Vidhi Ramesh · Neil Lawrence 🔗 |
-
|
DiffMol: 3D Structured Molecule Generation with Discrete Denoising Diffusion Probabilistic Models
(
Poster
)
link »
3D structures of molecules are often required to investigate atomistic phenomena accurately in industries such as drug design. We propose DiffMol, a novel method that utilizes diffusion models to generate the 3D position of atoms and utilizes the discrete denoising diffusion process to generate the atom type. Compared to existing methods, our algorithm offers greater flexibility for post-processing and refining the generated molecules and demonstrates faster performance. We provide theoretical proof of the equivariance of the diffusion process for molecule position generation. Our model achieved better than state-of-the-art performance in molecule/atom stability and molecule validity on benchmarks generating 3D molecules. |
Weitong Zhang · Xiaoyun Wang · Justin Smith · Joe Eaton · Brad Rees · Quanquan Gu 🔗 |
-
|
Diffusion Probabilistic Models for Structured Node Classification
(
Poster
)
link »
This paper studies structured node classification on graphs, where the predictions should consider dependencies between the node labels. In particular, we focus on solving the problem for partially labeled graphs where it is essential to incorporate the information in the known label for predicting the unknown labels. To address this issue, we propose a novel framework leveraging the diffusion probabilistic model for structured node classification (DPM-SNC). At the heart of our framework is the extraordinary capability of DPM-SNC to (a) learn a joint distribution over the labels with an expressive reverse diffusion process and (b) make predictions conditioned on the known labels utilizing manifold-constrained sampling. Since the DPMs lack training algorithms for partially labeled data, we design a novel training algorithm to apply DPMs, maximizing a new variational lower bound. We also theoretically analyze how DPMs benefit node classification by enhancing the expressive power of GNNs. We extensively verify the superiority of our DPM-SNC in diverse scenarios, which include not only the transductive setting but also the inductive setting. |
Hyosoon Jang · Seonghyun Park · Sangwoo Mo · Sungsoo Ahn 🔗 |
-
|
Large Dimensional Change Point Detection with FWER Control as Automatic Stopping
(
Poster
)
link »
We propose a statistical inference method for detecting change points in time-series of large panel data. The change points can have a general impact on different subsets of the panel. Our novel statistical perspective for high-dimensional change point detection combines selective inference and multiple testing. Our easy-to-use and computationally efficient procedure has two stages: First, LASSO regressions for each time-series screen a candidate set of change points. Second, we apply post-selection inference with a novel multiple testing adjustment to select the change points. Our method controls for the panel family-wise error rate with theoretical guarantees; hence guarding against p-hacking without the need for tuning parameters. In extensive simulations, our method outperforms leading benchmarks in terms of correct selections and false discovery. We have higher detection and make fewer Type I errors, leading to over 20% higher F1 classification scores. |
Jiacheng Zou · Yang Fan · Markus Pelger 🔗 |
-
|
Score-based Enhanced Sampling for Protein Molecular Dynamics
(
Poster
)
link »
The dynamic nature of proteins is crucial for determining their biological functions and properties, and molecular dynamics (MD) simulations stand as a predominant tool to study such phenomena. By utilizing empirically derived force fields, MD simulations explore the conformational space through numerically evolving the system along MD trajectories. However, the high-energy barrier of the force fields can hamper the exploration of MD, resulting in inadequately sampled ensemble. In this paper, we propose leveraging score-based generative models (SGMs) trained on large-scale general protein structures to perform protein con- formational sampling to complement traditional MD simulations. Experimental results demonstrate the effectiveness of our approach on several benchmark systems by comparing the results with long MD trajectories and state-of-the-art generative structure prediction models. |
Jiarui Lu · Bozitao Zhong · Jian Tang 🔗 |
-
|
Geometric Constraints in Probabilistic Manifolds: A Bridge from Molecular Dynamics to Structured Diffusion Processes
(
Poster
)
link »
Understanding the macroscopic characteristics of biological complexes demands precision and specificity in statistical ensemble modeling. One of the primary challenges in this domain lies in sampling from particular subsets of the state-space, driven either by existing structural knowledge or specific areas of interest within the state-space.We propose a method that enables sampling from distributions that rigorously adhere to arbitrary sets of geometric constraints in Euclidean spaces. This is achieved by integrating a constraint projection operator within the well-regarded architecture of Denoising Diffusion Probabilistic Models, a framework founded in generative modeling and probabilistic inference.The significance of this work becomes apparent, for instance, in the context of deep learning-based drug design, where it is imperative to maintain specific molecular profile interactions to realize the desired therapeutic outcomes and guarantee safety. |
Justin Diamond · Markus Lill 🔗 |
-
|
Reinforcement Learning-Driven Linker Design via Fast Attention-based Point Cloud Alignment
(
Poster
)
link »
PROteolysis-TArgeting Chimeras (PROTACs), which are comprised of two protein-binding domains connected via a linker, are a novel class of small molecules that enable the degradation of disease-relevant proteins. The design and optimization of the linker portion is challenging due to geometric and chemical constraints given by its interactions, and the need to maximize drug-likeness. To tackle these challenges, we introduce ShapeLinker, a method for de novo design of linkers that performs fragment-linking using reinforcement learning on an autoregressive SMILES generator. The method optimizes for a composite score combining relevant physicochemical properties and a novel, attention-based point cloud alignment score, which allows capturing a desired geometry to link the anchor and warhead. This method successfully generates linkers that satisfy 2D and 3D requirements, achieving state-of-the-art results in linker design for more efficient PROTAC optimization. |
Rebecca Manuela Neeser · Mehmet Akdel · Daniel Kovtun · Luca Naef 🔗 |
-
|
AbODE: Ab initio antibody design using conjoined ODEs
(
Poster
)
link »
Antibodies are Y-shaped proteins that neutralize pathogens and constitute the core of our adaptive immune system. De novo generation of new antibodies that target specific antigens holds the key to accelerating vaccine discovery. However, this co-design of the amino acid sequence and the 3D structure subsumes and accentuates, some central challenges from multiple tasks including protein folding, inverse folding, and docking. We strive to surmount these challenges with a new generative model AbODE that extends graph PDEs to accommodate both contextual information and external interactions. Unlike existing approaches, AbODE uses a single round of full-shot decoding, and elicits continuous differential attention that encapsulates, and evolves with, latent interactions within the antibody as well as those involving the antigen. We unravel fundamental connections between AbODE and temporal networks as well as graph-matching networks. The proposed model significantly outperforms existing methods on standard metrics across benchmarks. |
Yogesh Verma · Markus Heinonen · Vikas K Garg 🔗 |
-
|
Inferring Hierarchical Structure in Multi-Room Maze Environments
(
Poster
)
link »
Cognitive maps play a crucial role in facilitating flexible behaviour by representing spatial and conceptual relationships within an environment. The ability to learn and infer the underlying structure of the environment is crucial for effective exploration and navigation. This paper introduces a hierarchical active inference model addressing the challenge of inferring structure in the world from pixel-based observations. We propose a three-layer hierarchical model consisting of a cognitive map, an allocentric, and an egocentric world model, combining curiosity-driven exploration with goal-oriented behaviour at the different levels of reasoning from context to place to motion. This allows for efficient exploration and goal-directed search in room-structured mini-grid environments. |
Daria de Tinguy · Toon Van de Maele · Tim Verbelen · Bart Dhoedt 🔗 |
-
|
Parallel Sampling of Diffusion Models
(
Poster
)
link »
Diffusion models are powerful generative models but suffer from slow sampling, often taking 1000 sequential denoising steps for one sample. As a result, considerable efforts have been directed toward reducing the number of denoising steps, but these methods hurt sample quality. Instead of reducing the number of denoising steps (trading quality for speed), in this paper we explore an orthogonal approach: can we run the denoising steps in parallel (trading compute for speed)? In spite of the sequential nature of the denoising steps, we show that surprisingly it is possible to parallelize sampling via Picard iterations, by guessing the solution of future denoising steps and iteratively refining until convergence. With this insight, we present ParaDiGMS, a novel method to accelerate the sampling of pretrained diffusion models by denoising multiple steps in parallel. ParaDiGMS is the first diffusion sampling method that enables trading compute for speed and is even compatible with existing fast sampling techniques such as DDIM and DPMSolver. Using ParaDiGMS, we improve sampling speed by 2-4x across a range of robotics and image generation models, giving state-of-the-art sampling speeds of 0.2s on 100-step DiffusionPolicy and 16s on 1000-step StableDiffusion-v2 with no measurable degradation of task reward, FID score, or CLIP score. |
Andy Shih · Suneel Belkhale · Stefano Ermon · Dorsa Sadigh · Nima Anari 🔗 |
-
|
PITS: Variational Pitch Inference Without Fundamental Frequency for End-to-End Pitch-Controllable TTS
(
Poster
)
link »
Previous pitch-controllable text-to-speech (TTS) models rely on directly modeling fundamental frequency, leading to low variance in synthesized speech. To address this issue, we propose PITS, an end-to-end pitch-controllable TTS model that utilizes variational inference to model pitch. Based on VITS, PITS incorporates the Yingram encoder, the Yingram decoder, and adversarial training of pitch-shifted synthesis to achieve pitch-controllability. Experiments demonstrate that PITS generates high-quality speech that is indistinguishable from ground truth speech and has high pitch-controllability without quality degradation. Code and audio samples will be available at https://github.com/anonymous-pits/pits. |
Junhyeok Lee · Wonbin Jung · Hyunjae Cho · Jaeyeon Kim · Jaehwan Kim 🔗 |
-
|
Regularized Data Programming with Automated Bayesian Prior Selection
(
Poster
)
link »
The cost of manual data labeling can be a significant obstacle in supervised learning. Data programming (DP) offers a weakly supervised solution for training dataset creation, wherein the outputs of user-defined programmatic labeling functions (LFs) are reconciled through unsupervised learning. However, DP can fail to outperform an unweighted majority vote in some scenarios, including low-data contexts. This work introduces a Bayesian extension of classical DP that mitigates failures of unsupervised learning by augmenting the DP objective with regularization terms. Regularized learning is achieved through maximum a posteriori estimation in the Bayesian model. Majority vote is proposed as a proxy signal for automated prior parameter selection. Results suggest that regularized DP improves performance relative to maximum likelihood and majority voting, confers greater interpretability, and bolsters performance in low-data regimes. |
Jacqueline Maasch · Hao Zhang · Qian Yang · Fei Wang · Volodymyr Kuleshov 🔗 |
-
|
On the Identifiability of Markov Switching Models
(
Poster
)
link »
In the realm of interpretability and out-of-distribution generalization, the identifiability of latent variable models has emerged as a captivating field of inquiry. In this work, we delve into the identifiability of Markov Switching Models, taking an initial stride toward extending recent results to sequential latent variable models.We develop identifiability conditions for first-order Markov dependency structures, whose transition distribution is parametrised via non-linear Gaussians. Through empirical studies, we demonstrate the practicality of our approach in facilitating regime-dependent causal discovery and segmenting high-dimensional time series data. |
Carles Balsells Rodas · Yixin Wang · Yingzhen Li 🔗 |
-
|
Optimizing protein fitness using Bi-level Gibbs sampling with Graph-based Smoothing
(
Poster
)
link »
The ability to design novel proteins with higher fitness on a given task would be revolutionary for many fields of medicine. However, brute-force search through the combinatorially large space of sequences is infeasible. Prior methods constrain search to a small mutational radius from a reference sequence, but such heuristics drastically limit the design space. Our work seeks to remove the restriction on mutational distance while enabling efficient exploration. We propose Bi-level Gibbs sampling with Graph-based Smoothing (BiGGS) which uses the gradients of a trained fitness predictor to sample many mutations towards higher fitness. Bi-level Gibbs first samples sequence locations then sequence edits. We introduce graph-based smoothing to remove noisy gradients that lead to false positives. Our method is state-of-the-art in discovering high-fitness proteins with up to 8 mutations from the training set. We study the GFP and AAV design problems, ablations, and baselines to elucidate the results. |
Andrew Kirjner · Jason Yim · Raman Samusevich · Tommi Jaakkola · Regina Barzilay · Ila R. Fiete 🔗 |
-
|
Robust and Scalable Bayesian Online Changepoint Detection
(
Poster
)
link »
This paper proposes an online, provably robust, and scalable Bayesian approach for changepoint detection. The resulting algorithm has key advantages over previous work: it provides provable robustness by leveraging the generalised Bayesian perspective and also addresses the scalability issues of previous attempts. Specifically, the proposed generalised Bayesian formalism leads to conjugate posteriors whose parameters are available in closed form by leveraging diffusion score matching. The resulting algorithm is exact and can be updated through simple algebra. |
Matias Altamirano · Francois-Xavier Briol · Jeremias Knoblauch 🔗 |
-
|
GSURE-Based Diffusion Model Training with Corrupted Data
(
Poster
)
link »
Diffusion models have demonstrated impressive results in both data generation and downstream tasks such as inverse problems, text-based editing, classification, and more. However, training such models usually requires large amounts of clean signals which are often difficult or impossible to obtain. In this work, we propose a novel training technique for generative diffusion models based only on corrupted data. We introduce a loss function based on the Generalized Stein's Unbiased Risk Estimator (GSURE), and prove that under some conditions, it is equivalent to the training objective used in fully supervised diffusion models. We demonstrate our technique on face images as well as Magnetic Resonance Imaging (MRI), where the use of undersampled data significantly alleviates data collection costs. Our approach achieves generative performance comparable to its fully supervised counterpart without training on any clean signals. In addition, we deploy the resulting diffusion model in various downstream tasks beyond the degradation present in the training set, showcasing promising results. |
Bahjat Kawar · Noam Elata · Tomer Michaeli · Michael Elad 🔗 |
-
|
Hierarchical Graph Generation with $K^{2}$-trees
(
Poster
)
link »
Generating graphs from a target distribution is a significant challenge across many domains, including drug discovery and social network analysis. In this work, we introduce a novel graph generation method leveraging $K^{2}$-tree representation which was originally designed for lossless graph compression. Our motivation stems from the ability of the $K^{2}$-trees to enable compact generation while concurrently capturing the inherent hierarchical structure of a graph. In addition, we make further contributions by (1) presenting a sequential $K^{2}$-tree representation that incorporates pruning, flattening, and tokenization processes and (2) introducing a Transformer-based architecture designed to generate the sequence by incorporating a specialized tree positional encoding scheme. Finally, we extensively evaluate our algorithm on four general and two molecular graph datasets to confirm its superiority for graph generation.
|
Yunhui Jang · Dongwoo Kim · Sungsoo Ahn 🔗 |
-
|
Bootstrapped Training of Score-Conditioned Generator for Offline Design of Biological Sequences
(
Poster
)
link »
We study the problem of optimizing biological sequences, e.g., proteins, DNA, and RNA, to maximize a black-box score function that is only evaluated in an offline dataset. We propose a novel solution, bootstrapped training of score-conditioned generator (BootGen) algorithm. Our algorithm repeats a two-stage process. In the first stage, our algorithm trains the biological sequence generator with rank-based weights to enhance the accuracy of sequence generation based on high scores. The subsequent stage involves bootstrapping, which augments the training dataset with self-generated data labeled by a proxy score function. Our key idea is to align the score-based generation with a proxy score function, which distills the knowledge of the proxy score function to the generator. After training, we aggregate samples from multiple bootstrapped generators and proxies to produce a diverse design. Extensive experiments show that our method outperforms competitive baselines on biological sequential design tasks. |
Minsu Kim · Federico Berto · Sungsoo Ahn · Jinkyoo Park 🔗 |
-
|
Thompson Sampling for Improved Exploration in GFlowNets
(
Poster
)
link »
Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work. |
Jarrid Rector-Brooks · Kanika Madan · Moksh Jain · Maksym Korablyov · Chenghao Liu · Sarath Chandar · Nikolay Malkin · Yoshua Bengio 🔗 |
-
|
GFlowNets for Causal Discovery: an Overview
(
Poster
)
link »
Causal relationships underpin modern science and our ability to reason. Automatically discovering useful causal relationships can greatly accelerate scientific progress and facilitate the creation of machines that can reason like we do. Traditionally, the dominant approaches to causal discovery are statistical, such as the PC algorithm. A new area of research is integrating recent advancement in machine learning with causal discovery. We focus on a series of recent work that leverages new algorithms in deep learning for causal discovery -- notably, generative flow networks (GFlowNets). We discuss the unique perspectives GFlowNets bring to causal discovery. |
Dragos Cristian Manta · Edward Hu · Yoshua Bengio 🔗 |
-
|
Concept Algebra for Score-based Conditional Model
(
Poster
)
link »
This paper concerns the structure of learned representations in text-guided generative models, focusing on score-based models. A key property of such models is that they can compose disparate concepts in a 'disentangled' manner.This suggests these models have internal representations that encode concepts in a 'disentangled' manner. Here, we focus on the idea that concepts are encoded as subspaces of some representation space. We formalize what this means, show there's a natural choice for the representation, and develop a simple method for identifying the part of the representation corresponding to a given concept. In particular, this allows us to manipulate the concepts expressed by the model through algebraic manipulation of the representation. We demonstrate the idea with examples using Stable Diffusion. |
Zihao Wang · Lin Gui · Jeffrey Negrea · Victor Veitch 🔗 |
-
|
Diffusion Models with Grouped Latents for Interpretable Latent Space
(
Poster
)
link »
Latent variable models are useful tools for discovering independent generative factors of data without human supervision. From an ODE formulation, diffusion models are invertible latent variable models, but unlike other models like VAEs, their latent variables are often not interpretable. For example, traversing a single element of the latent noise does not lead to a meaningful variation of generated contents. To settle this issue, we propose to divide a latent vector into multiple groups of elements and design different noise schedules for each group. By doing so, we can allow each group to control only certain elements of data, explicitly giving interpretable meaning. Applying our method in the frequency domain, the latent variable becomes a hierarchical representation where individual groups encode data at different levels of abstraction. We show several applications of such representation including disentanglement of semantic attributes or image editing. |
Sangyun Lee · Gayoung Lee · Hyunsu Kim · Kim Junho · Youngjung Uh 🔗 |
-
|
Multilevel Control Functional
(
Poster
)
link »
Control variates are variance reduction techniques for Monte Carlo estimators. They can reduce the cost of the estimation of integrals involving computationally expensive scientific models. We propose an extension of control variates, multilevel control functional (MLCF), which uses non-parametric Stein-based control variates and multifidelity models with lower cost to gain better performance. MLCF is widely applicable. We show that when the integrand and the density are smooth, and when the dimensionality is not very high, MLCF enjoys a fast convergence rate. We provide both theoretical analysis and empirical assessments on differential equation examples, including a Bayesian inference for ecological model example, to demonstrate the effectiveness of our proposed approach. |
Kaiyu Li · Zhuo Sun 🔗 |
-
|
HINT: Hierarchical Coherent Networks For Constrained Probabilistic Forecasting
(
Poster
)
link »
Large collections of time series data are commonly organized into hierarchies with different levels of aggregation.We present Hierarchical Coherent Networks (HINT), a forecasting framework that adheres to these aggregation constraints. We specialized HINT in the task via a multivariate mixture optimized with composite likelihood and made coherent via bootstrap reconciliation. Additionally, we robustify the networks to stark time series scale variations, incorporating normalized feature extraction and recomposition of output scales within their architecture. We demonstrate improved accuracy compared to the existing state-of-the-art. We provide ablation studies on our model's components and its solid theoretical foundations. HINT's code is available at this http URL. |
Kin Gutierrez · David Luo · Cristian Challu · Stefania La Vattiata · Max Mergenthaler Canseco · Artur Dubrawski 🔗 |