Timezone: »

Dimensionality Reduction for the Sum-of-Distances Metric
Zhili Feng · Praneeth Kacham · David Woodruff

Thu Jul 22 06:00 PM -- 06:20 PM (PDT) @
We give a dimensionality reduction procedure to approximate the sum of distances of a given set of $n$ points in $R^d$ to any ``shape'' that lies in a $k$-dimensional subspace. Here, by ``shape'' we mean any set of points in $R^d$. Our algorithm takes an input in the form of an $n \times d$ matrix $A$, where each row of $A$ denotes a data point, and outputs a subspace $P$ of dimension $O(k^{3}/\epsilon^6)$ such that the projections of each of the $n$ points onto the subspace $P$ and the distances of each of the points to the subspace $P$ are sufficient to obtain an $\epsilon$-approximation to the sum of distances to any arbitrary shape that lies in a $k$-dimensional subspace of $R^d$. These include important problems such as $k$-median, $k$-subspace approximation, and $(j,l)$ subspace clustering with $j \cdot l \leq k$. Dimensionality reduction reduces the data storage requirement to $(n+d)k^{3}/\epsilon^6$ from nnz$(A)$. Here nnz$(A)$ could potentially be as large as $nd$. Our algorithm runs in time nnz$(A)/\epsilon^2 + (n+d)$poly$(k/\epsilon)$, up to logarithmic factors. For dense matrices, where nnz$(A) \approx nd$, we give a faster algorithm, that runs in time $nd + (n+d)$poly$(k/\epsilon)$ up to logarithmic factors. Our dimensionality reduction algorithm can also be used to obtain poly$(k/\epsilon)$ size coresets for $k$-median and $(k,1)$-subspace approximation problems in polynomial time.

Author Information

Zhili Feng (Carnegie Mellon University)
Praneeth Kacham (Carnegie Mellon University)
David Woodruff (Carnegie Mellon University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors