Track: Bayesian Non-parametrics

Wed 12 June 14:00 - 14:20 PDT

Beyond the Chinese Restaurant and Pitman-Yor processes: Statistical Models with double power-law behavior

Fadhel Ayed · Juho Lee · Francois Caron

Bayesian nonparametric approaches, in particular the Pitman-Yor process and the associated two-parameter Chinese Restaurant process, have been successfully used in applications where the data exhibit a power-law behavior. Examples include natural language processing, natural images or networks. There is also growing empirical evidence that some datasets exhibit a two-regime power-law behavior: one regime for small frequencies, and a second regime, with a different exponent, for high frequencies. In this paper, we introduce a class of completely random measures which are doubly regularly-varying. Contrary to the Pitman-Yor process, we show that when completely random measures in this class are normalized to obtain random probability measures and associated random partitions, such partitions exhibit a double power-law behavior. We discuss in particular three models within this class: the beta prime process (Broderick et al. (2015, 2018), a novel process call generalized BFRY process, and a mixture construction. We derive efficient Markov chain Monte Carlo algorithms to estimate the parameters of these models. Finally, we show that the proposed models provide a better fit than the Pitman-Yor process on various datasets.

Wed 12 June 14:20 - 14:25 PDT

DP-GP-LVM: A Bayesian Non-Parametric Model for Learning Multivariate Dependency Structures

Andrew R Lawrence · Carl Henrik Ek · Neill Campbell

We present a non-parametric Bayesian latent variable model capable of learning dependency structures across dimensions in a multivariate setting. Our approach is based on flexible Gaussian process priors for the generative mappings and interchangeable Dirichlet process priors to learn the structure. The introduction of the Dirichlet process as a specific structural prior allows our model to circumvent issues associated with previous Gaussian process latent variable models. Inference is performed by deriving an efficient variational bound on the marginal log-likelihood of the model. We demonstrate the efficacy of our approach via analysis of discovered structure and superior quantitative performance on missing data imputation.

Wed 12 June 14:25 - 14:30 PDT

Random Function Priors for Correlation Modeling

Aonan Zhang · John Paisley

Many hidden structures underlying high dimensional data can be compactly expressed by a discrete random measure $\xi_n=\sum_{k\in[K]} Z_{nk}\delta_{\theta_k}$, where $(\theta_k)_{k\in[K]}\subset\Theta$ is a collection of hidden atoms shared across observations (indexed by $n$). Previous Bayesian nonparametric methods focus on embedding $\xi_n$ onto alternative spaces to resolve complex atom correlations. However, these methods can be rigid and hard to learn in practice. In this paper, we temporarily ignore the atom space $\Theta$ and embed population random measures $(\xi_n)_{n\in\bbN}$ altogether as $\xi'$ onto an infinite strip $[0,1]\times\bbR_+$, where the order of atoms is \textit{removed} by assuming separate exchangeability. Through a ``de Finetti type" result, we can represent $\xi'$ as a coupling of a 2d Poisson process and exchangeable random functions $(f_n)_{n\in\bbN}$, where each $f_n$ is an object-specific atom sampling function. In this way, we transform the problem from learning complex correlations with discrete random measures into learning complex functions that can be learned with deep neural networks. In practice, we introduce an efficient amortized variational inference algorithm to learn $f_n$ without pain; i.e., no local gradient steps are required during stochastic inference.

Wed 12 June 14:30 - 14:35 PDT

Variational Russian Roulette for Deep Bayesian Nonparametrics

Kai Xu · Akash Srivastava · Charles Sutton

Bayesian nonparametric models provide a principled way to automatically adapt the complexity of a model to the amount of the data available, but computation in such models is difficult. Amortized variational approximations are appealing because of their computational efficiency, but current methods rely on a fixed finite truncation of the infinite model. This truncation level can be difficult to set, and also interacts poorly with amortized methods due to the over-pruning problem. Instead, we propose a new variational approximation, based on a method from statistical physics called Russian roulette sampling. This allows the variational distribution to adapt its complexity during inference, without relying on a fixed truncation level, and while still obtaining an unbiased estimate of the gradient of the original variational objective. We demonstrate this method on infinite sized variational auto-encoders using a Beta-Bernoulli (Indian buffet process) prior.

Wed 12 June 14:35 - 14:40 PDT

Incorporating Grouping Information into Bayesian Decision Tree Ensembles

JUNLIANG DU · Antonio Linero

We consider the problem of nonparametric regression in the high-dimensional setting in which $P \gg N$. We study the use of overlapping group structures to improve prediction and variable selection. These structures arise commonly when analyzing DNA microarray data, where genes can naturally be grouped according to genetic pathways. We incorporate overlapping group structure into a Bayesian additive regression trees model using a prior constructed so that, if a variable from some group is used to construct a split, this increases the probability that subsequent splits will use predictors from the same group. We refer to our model as an overlapping group Bayesian additive regression trees (OG-BART) model, and our prior on the splits an overlapping group Dirichlet (OG-Dirichlet) prior. Like the sparse group lasso, our prior encourages sparsity both within and between groups. We study the correlation structure of the prior, illustrate the proposed methodology on simulated data, and apply the methodology to gene expression data to learn which genetic pathways are predictive of breast cancer tumor metastasis.

Wed 12 June 14:40 - 15:00 PDT

Variational Implicit Processes

Chao Ma · Yingzhen Li · Jose Miguel Hernandez-Lobato

We introduce the implicit process (IP), a stochastic process that places implicitly defined multivariate distributions over any finite collections of random variables. IPs are therefore highly flexible implicit priors over \emph{functions}, with examples include data simulators, Bayesian neural networks and non-linear transformations of stochastic processes. A novel and efficient function space approximate Bayesian inference algorithm for IPs, namely the variational implicit processes (VIPs), is derived using generalised wake-sleep updates. This method returns simple update equations and allows scalable hyper-parameter learning with stochastic optimization. Experiments demonstrate that VIPs return better uncertainty estimates and superior performance over existing inference methods for challenging models such as Bayesian LSTMs, Bayesian neural networks, and Gaussian processes.

Wed 12 June 15:00 - 15:05 PDT

Discovering Latent Covariance Structures for Multiple Time Series

Anh Tong · Jaesik Choi

Analyzing multivariate time series data is important to predict future events and changes of complex systems in finance, manufacturing, and administrative decisions. The expressiveness power of Gaussian Process (GP) regression methods has been significantly improved by compositional covariance structures. In this paper, we present a new GP model which naturally handles multiple time series by placing an Indian Buffet Process (IBP) prior on the presence of shared kernels. Our selective covariance structure decomposition allows exploiting shared parameters over a set of multiple, selected time series. We also investigate the well-definedness of the models when infinite latent components are introduced. We present a pragmatic search algorithm which explores a larger structure space efficiently. Experiments conducted on five real-world data sets demonstrate that our new model outperforms existing methods in term of structure discoveries and predictive performances.

Wed 12 June 15:05 - 15:10 PDT

Scalable Training of Inference Networks for Gaussian-Process Models

Jiaxin Shi · Mohammad Emtiyaz Khan · Jun Zhu

Inference in Gaussian process (GP) models is computationally challenging for large data, and often difficult to approximate with a small number of inducing points. We explore an alternative approximation that employs stochastic inference networks (e.g., Bayesian neural networks) for a flexible inference. Unfortunately, for such networks, minibatch training is difficult to be able to learn meaningful correlations over function outputs for a large dataset. We propose an algorithm that enables such training by tracking a stochastic, functional mirror-descent algorithm. At each iteration, this only requires considering a finite number of input locations, resulting in a scalable and easy-to-implement algorithm. Empirical results show comparable and, sometimes, superior performance to existing sparse variational GP methods.

Wed 12 June 15:10 - 15:15 PDT

Bayesian Optimization Meets Bayesian Optimal Stopping

Zhongxiang Dai · Haibin Yu · Bryan Kian Hsiang Low · Patrick Jaillet

Bayesian optimization (BO) is a popular paradigm for optimizing the hyperparameters of machine learning (ML) models due to its sample efficiency. Many ML models require running an iterative training procedure (e.g., stochastic gradient descent). This motivates the question whether information available during the training process (e.g., validation accuracy after each epoch) can be exploited for improving the epoch efficiency of BO algorithms by early-stopping model training under hyperparameter settings that will end up under-performing and hence eliminating unnecessary training epochs. This paper proposes to unify BO (specifically, Gaussian process-upper confidence bound (GP-UCB)) with Bayesian optimal stopping (BO-BOS) to boost the epoch efficiency of BO. To achieve this, while GP-UCB is sample-efficient in the number of function evaluations, BOS complements it with epoch efficiency for each function evaluation by providing a principled optimal stopping mechanism for early stopping. BO-BOS preserves the (asymptotic) no-regret performance of GP-UCB using our specified choice of BOS parameters that is amenable to an elegant interpretation in terms of the exploration-exploitation trade-off. We empirically evaluate the performance of BO-BOS and demonstrate its generality in hyperparameter optimization of ML models and two other interesting applications.

Wed 12 June 15:15 - 15:20 PDT

Learning interpretable continuous-time models of latent stochastic dynamical systems

Lea Duncker · Gergo Bohner · Julien Boussard · Maneesh Sahani

We develop an approach to learn an interpretable semi-parametric model of a latent continuous-time stochastic dynamical system, assuming noisy high-dimensional outputs sampled at uneven times. The dynamics are described by a nonlinear stochastic differential equation (SDE) driven by a Wiener process, with a drift evolution function drawn from a Gaussian process (GP) conditioned on a set of learnt fixed points and corresponding local Jacobian matrices. This form yields a flexible nonparametric model of the dynamics, with a representation corresponding directly to the interpretable portraits routinely employed in the study of nonlinear dynamical systems. The learning algorithm combines inference of continuous latent paths underlying observed data with a sparse variational description of the dynamical process. We demonstrate our approach on simulated data from different nonlinear dynamical systems.

Main Navigation

Session

Bayesian Non-parametrics