Skip to yearly menu bar Skip to main content


Session

Matrix Factorization 1

Abstract:
Chat is not available.

Thu 12 July 2:00 - 2:20 PDT

Probabilistic Boolean Tensor Decomposition

Tammo Rukat · Christopher Holmes · Christopher Yau

Boolean tensor decomposition approximates data of multi-way binary relationships as product of interpretable low-rank binary factors, following the rules Boolean algebra. Here, we present its first probabilistic treatment. We facilitate scalable sampling-based posterior inference by exploitation of the combinatorial structure of the factor conditionals. Maximum a posteriori estimates consistently outperform existing non-probabilistic approaches. We show that our performance gains can partially be explained by convergence to solutions that occupy relatively large regions of the parameter space, as well as by implicit model averaging. Moreover, the Bayesian treatment facilitates model selection with much greater accuracy than the previously suggested minimum description length based approach. We investigate three real-world data sets. First, temporal interaction networks and behavioural data of university students demonstrate the inference of instructive latent patterns. Next, we decompose a tensor with more than 10 Billion data points, indicating relations of gene expression in cancer patients. Not only does this demonstrate scalability, it also provides an entirely novel perspective on relational properties of continuous data and, in the present example, on the molecular heterogeneity of cancer. Our implementation is available on GitHub: https://github.com/TammoR/LogicalFactorisationMachines

Thu 12 July 2:20 - 2:30 PDT

A Primal-Dual Analysis of Global Optimality in Nonconvex Low-Rank Matrix Recovery

Xiao Zhang · Lingxiao Wang · Yaodong Yu · Quanquan Gu

We propose a primal-dual based framework for analyzing the global optimality of nonconvex low-rank matrix recovery. Our analysis are based on the restricted strongly convex and smooth conditions, which can be verified for a broad family of loss functions. In addition, our analytic framework can directly handle the widely-used incoherence constraints through the lens of duality. We illustrate the applicability of the proposed framework to matrix completion and one-bit matrix completion, and prove that all these problems have no spurious local minima. Our results not only improve the sample complexity required for characterizing the global optimality of matrix completion, but also resolve an open problem in \citet{ge2017no} regarding one-bit matrix completion. Numerical experiments show that primal-dual based algorithm can successfully recover the global optimum for various low-rank problems.

Thu 12 July 2:30 - 2:40 PDT

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion

Cong Ma · Kaizheng Wang · Yuejie Chi · Yuxin Chen

Recent years have seen a flurry of activities in designing provably efficient nonconvex optimization procedures for solving statistical estimation problems. For various problems like phase retrieval or low-rank matrix completion, state-of-the-art nonconvex procedures require proper regularization (e.g.~trimming, regularized cost, projection) in order to guarantee fastconvergence. When it comes to vanilla procedures such as gradient descent, however, prior theory either recommends highly conservative learning rates to avoid overshooting, or completely lacks performance guarantees. This paper uncovers a striking phenomenon in several nonconvex problems: even in the absence of explicit regularization, gradient descent follows a trajectory staying within a basin that enjoys nice geometry, consisting of points incoherent with the sampling mechanism. This ``implicit regularization'' feature allows gradient descent to proceed in a far more aggressive fashion without overshooting, which in turn results in substantial computational savings. Focusing on two statistical estimation problems, i.e.~solving random quadratic systems of equations and low-rank matrix completion, we establish that gradient descent achieves near-optimal statistical and computational guarantees without explicit regularization. As a byproduct, for noisy matrix completion, we demonstrate that gradient descent enables optimal control of both entrywise and spectral-norm errors.

Thu 12 July 2:40 - 2:50 PDT

Learning Binary Latent Variable Models: A Tensor Eigenpair Approach

Ariel Jaffe · Roi Weiss · Boaz Nadler · Shai Carmi · Yuval Kluger

Latent variable models with hidden binary units appear in various applications. Learning such models, in particular in the presence of noise, is a challenging computational problem. In this paper we propose a novel spectral approach to this problem, based on the eigenvectors of both the second order moment matrix and third order moment tensor of the observed data. We prove that undermild non-degeneracy conditions, our method consistently estimates the model parameters at the optimal parametric rate. Our tensor-based methodgeneralizes previous orthogonal tensor decomposition approaches, where the hidden units were assumed to be either statistically independent ormutually exclusive. We illustrate the consistency of our method on simulated data and demonstrate its usefulness in learning a common model forpopulation mixtures in genetics.

Thu 12 July 2:50 - 3:00 PDT

Closed-form Marginal Likelihood in Gamma-Poisson Matrix Factorization

Louis Filstroff · Alberto Lumbreras · Cedric Fevotte

We present novel understandings of the Gamma-Poisson (GaP) model, a probabilistic matrix factorization model for count data. We show that GaP can be rewritten free of the score/activation matrix. This gives us new insights about the estimation of the topic/dictionary matrix by maximum marginal likelihood estimation. In particular, this explains the robustness of this estimator to over-specified values of the factorization rank, especially its ability to automatically prune irrelevant dictionary columns, as empirically observed in previous work. The marginalization of the activation matrix leads in turn to a new Monte Carlo Expectation-Maximization algorithm with favorable properties.