Full Regularization Path for Sparse Principal Component Analysis

Full Regularization Path for Sparse Principal Component Analysis
Alexandre d'Aspremont - Princeton University, USA Francis R. Bach - Ecole des Mines de Paris, France Laurent El Ghaoui - U.C. Berkeley, USA
Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a particular linear combination of the input variables while constraining the number of nonzero coeffcients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. We formulate a new semidefinite relaxation to this problem and derive a greedy algorithm that computes a full set of good solutions for all numbers of non zero coeffcients, with complexity O(n^3), where n is the number of variables. We then use the same relaxation to derive suffcient conditions for global optimality of a solution, which can be tested in O(n^3). We show on toy examples and biological data that our algorithm does provide globally optimal solutions in many cases.

Alexandre d'Aspremont - Princeton University, USA
Francis R. Bach - Ecole des Mines de Paris, France
Laurent El Ghaoui - U.C. Berkeley, USA

Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a particular linear combination of the input variables while constraining the number of nonzero coeffcients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. We formulate a new semidefinite relaxation to this problem and derive a greedy algorithm that computes a full set of good solutions for all numbers of non zero coeffcients, with complexity O(n^3), where n is the number of variables. We then use the same relaxation to derive suffcient conditions for global optimality of a solution, which can be tested in O(n^3). We show on toy examples and biological data that our algorithm does provide globally optimal solutions in many cases.