ICML Poster Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering

Poster

Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering

Romain COUILLET · Florent Chatelain · Nicolas Le Bihan

Virtual

Keywords: [ Statistical Learning Theory ]

[ Abstract ] [ Paper PDF ]

[ Slides]

[ Paper ]

[ Visit Poster at Spot D2 in Virtual World ]

Abstract: The article introduces an elementary cost and storage reduction method for spectral clustering and principal component analysis. The method consists in randomly

puncturing'' both the data matrix

X \in C^{p \times n}

$X\in\mathbb{C}^{p\times n}$ (or

R^{p \times n}

$\mathbb{R}^{p\times n}$ ) and its corresponding kernel (Gram) matrix

K

$K$ through Bernoulli masks:

S \in {0, 1}^{p \times n}

$S\in\{0,1\}^{p\times n}$ for

X

$X$ and

B \in {0, 1}^{n \times n}

$B\in\{0,1\}^{n\times n}$ for

K

$K$ . The resulting

two-way punctured'' kernel is thus given by

K = \frac{1}{p} [(X ⊙ S)^{\H} (X ⊙ S)] ⊙ B

$K=\frac1p[(X\odot S)^\H (X\odot S)]\odot B$ . We demonstrate that, for

X

$X$ composed of independent columns drawn from a Gaussian mixture model, as

n, p \to \infty

$n,p\to\infty$ with

p / n \to c_{0} \in (0, \infty)

$p/n\to c_0\in(0,\infty)$ , the spectral behavior of

K

$K$ -- its limiting eigenvalue distribution, as well as its isolated eigenvalues and eigenvectors -- is fully tractable and exhibits a series of counter-intuitive phenomena. We notably prove, and empirically confirm on various image databases, that it is possible to drastically puncture the data, thereby providing possibly huge computational and storage gains, for a virtually constant (clustering or PCA) performance. This preliminary study opens as such the path towards rethinking, from a large dimensional standpoint, computational and storage costs in elementary machine learning models.

Chat is not available.