Skip to yearly menu bar Skip to main content


Poster

Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering

Romain COUILLET · Florent Chatelain · Nicolas Le Bihan

Virtual

Keywords: [ Statistical Learning Theory ]


Abstract: The article introduces an elementary cost and storage reduction method for spectral clustering and principal component analysis. The method consists in randomly puncturing'' both the data matrix XCp×n (or Rp×n) and its corresponding kernel (Gram) matrix K through Bernoulli masks: S{0,1}p×n for X and B{0,1}n×n for K. The resulting two-way punctured'' kernel is thus given by K=1p[(XS)\H(XS)]B. We demonstrate that, for X composed of independent columns drawn from a Gaussian mixture model, as n,p with p/nc0(0,), the spectral behavior of K -- its limiting eigenvalue distribution, as well as its isolated eigenvalues and eigenvectors -- is fully tractable and exhibits a series of counter-intuitive phenomena. We notably prove, and empirically confirm on various image databases, that it is possible to drastically puncture the data, thereby providing possibly huge computational and storage gains, for a virtually constant (clustering or PCA) performance. This preliminary study opens as such the path towards rethinking, from a large dimensional standpoint, computational and storage costs in elementary machine learning models.

Chat is not available.