Processing math: 100%
Skip to yearly menu bar Skip to main content


Oral

Dimensionality Reduction for the Sum-of-Distances Metric

Zhili Feng · Praneeth Kacham · David Woodruff

Abstract: We give a dimensionality reduction procedure to approximate the sum of distances of a given set of n points in Rd to any shape'' that lies in a k-dimensional subspace. Here, by shape'' we mean any set of points in Rd. Our algorithm takes an input in the form of an n×d matrix A, where each row of A denotes a data point, and outputs a subspace P of dimension O(k3/ϵ6) such that the projections of each of the n points onto the subspace P and the distances of each of the points to the subspace P are sufficient to obtain an ϵ-approximation to the sum of distances to any arbitrary shape that lies in a k-dimensional subspace of Rd. These include important problems such as k-median, k-subspace approximation, and (j,l) subspace clustering with jlk. Dimensionality reduction reduces the data storage requirement to (n+d)k3/ϵ6 from nnz(A). Here nnz(A) could potentially be as large as nd. Our algorithm runs in time nnz(A)/ϵ2+(n+d)poly(k/ϵ), up to logarithmic factors. For dense matrices, where nnz(A)nd, we give a faster algorithm, that runs in time nd+(n+d)poly(k/ϵ) up to logarithmic factors. Our dimensionality reduction algorithm can also be used to obtain poly(k/ϵ) size coresets for k-median and (k,1)-subspace approximation problems in polynomial time.

Chat is not available.