Fast Reconstruction of Mixtures of Bernoulli Product Distributions
Sanyam Agarwal ⋅ Pranjal Dutta ⋅ Markus Bläser
Abstract
Mixtures of Bernoulli product distributions are a simple and widely used latent-variable model, with applications in e.g.\ recommendation systems, crowdsourcing, and medical data analysis. We consider the problem of reconstructing the mixture parameters from oracle access to its probability generating polynomial (PGP), for instance represented by a probabilistic generating circuit (PGC). We show that the parameters are uniquely identifiable for almost all mixtures, and give a randomized algorithm that exactly recovers the mixture weights and component marginals for mixtures of $r$ Bernoulli product distributions over $n$ variables using only $O(nr^2)$ oracle queries. The algorithm repeatedly applies restrictions to $O(r)$ variables, extracts low-degree coefficients, and then recovers the parameters using a moment-based tensor decomposition. To the best of our knowledge, this is the {\em first} exact reconstruction algorithm in this PGP oracle model with query complexity linear in $n$ and polynomial in $r$.
Successful Page Load