ICML 2021 Dimensionality Reduction for the Sum-of-Distances Metric Oral

Dimensionality Reduction for the Sum-of-Distances Metric

[ Abstract ] [ Visit Unsupervised Learning 2 ] [ Paper ]

[ Paper ]

Abstract: We give a dimensionality reduction procedure to approximate the sum of distances of a given set of

$n$ points in

$R^d$ to any

shape'' that lies in a

$k$ -dimensional subspace. Here, by

shape'' we mean any set of points in

$R^d$ . Our algorithm takes an input in the form of an

$n \times d$ matrix

$A$ , where each row of

$A$ denotes a data point, and outputs a subspace

$P$ of dimension

$O(k^{3}/\epsilon^6)$ such that the projections of each of the

$n$ points onto the subspace

$P$ and the distances of each of the points to the subspace

$P$ are sufficient to obtain an

$\epsilon$ -approximation to the sum of distances to any arbitrary shape that lies in a

$k$ -dimensional subspace of

$R^d$ . These include important problems such as

$k$ -median,

$k$ -subspace approximation, and

$(j,l)$ subspace clustering with

$j \cdot l \leq k$ . Dimensionality reduction reduces the data storage requirement to

$(n+d)k^{3}/\epsilon^6$ from nnz

$(A)$ . Here nnz

$(A)$ could potentially be as large as

$nd$ . Our algorithm runs in time nnz

$(A)/\epsilon^2 + (n+d)$ poly

$(k/\epsilon)$ , up to logarithmic factors. For dense matrices, where nnz

$(A) \approx nd$ , we give a faster algorithm, that runs in time

$nd + (n+d)$ poly

$(k/\epsilon)$ up to logarithmic factors. Our dimensionality reduction algorithm can also be used to obtain poly

$(k/\epsilon)$ size coresets for

$k$ -median and

$(k,1)$ -subspace approximation problems in polynomial time.

Chat is not available.