Timezone: »
Spotlight
Single Pass Entrywise-Transformed Low Rank Approximation
Yifei Jiang · Yi Li · Yiming Sun · Jiaxin Wang · David Woodruff
In applications such as natural language processing or computer vision, one is given a large $n \times n$ matrix $A = (a_{i,j})$ and would like to compute a matrix decomposition, e.g., a low rank approximation, of a function $f(A) = (f(a_{i,j}))$ applied entrywise to $A$. A very important special case is the likelihood function $f\left( A \right ) = \log{\left( \left| a_{ij}\right| +1\right)}$. A natural way to do this would be to simply apply $f$ to each entry of $A$, and then compute the matrix decomposition, but this requires storing all of $A$ as well as multiple passes over its entries. Recent work of Liang et al. shows how to find a rank-$k$ factorization to $f(A)$ using only $n \cdot \poly(\eps^{-1}k\log n)$ words of memory, with overall error $10\|f(A)-[f(A)]_k\|_F^2 + \poly(\epsilon/k) \|f(A)\|_{1,2}^2$, where $[f(A)]_k$ is the best rank-$k$ approximation to $f(A)$ and $\|f(A)\|_{1,2}^2$ is the square of the sum of Euclidean lengths of rows of $f(A)$. Their algorithm uses $3$ passes over the entries of $A$. The authors pose the open question of obtaining an algorithm with $n \cdot \poly(\eps^{-1}k\log n)$ words of memory using only a single pass over the entries of $A$.
In this paper we resolve this open question, obtaining the first single-pass algorithm for this problem and for the same class of functions $f$ studied by Liang et al. Moreover, our error is $\|f(A)-[f(A)]_k\|_F^2 + \poly(\epsilon/k) \|f(A)\|_F^2$, where $\|f(A)\|_F^2$ is the sum of squares of Euclidean lengths of rows of $f(A)$. Thus our error is significantly smaller, as it removes the factor of $10$ and also $\|f(A)\|_F^2 \leq \|f(A)\|_{1,2}^2$.
Author Information
Yifei Jiang (Tianjin University)
Yi Li (Nanyang Technological University)
Yiming Sun (Nanyang Technological University)
Jiaxin Wang (Wuhan University of Technology)
David Woodruff (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Single Pass Entrywise-Transformed Low Rank Approximation »
Fri. Jul 23rd 04:00 -- 06:00 AM Room
More from the Same Authors
-
2022 Poster: Sketching Algorithms and Lower Bounds for Ridge Regression »
Praneeth Kacham · David Woodruff -
2022 Poster: Learning Augmented Binary Search Trees »
Honghao Lin · Tian Luo · David Woodruff -
2022 Poster: Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time »
David Woodruff · Amir Zandieh -
2022 Spotlight: Sketching Algorithms and Lower Bounds for Ridge Regression »
Praneeth Kacham · David Woodruff -
2022 Spotlight: Learning Augmented Binary Search Trees »
Honghao Lin · Tian Luo · David Woodruff -
2022 Spotlight: Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time »
David Woodruff · Amir Zandieh -
2022 Poster: Bounding the Width of Neural Networks via Coupled Initialization - A Worst Case Analysis »
Alexander Munteanu · Simon Omlor · Zhao Song · David Woodruff -
2022 Spotlight: Bounding the Width of Neural Networks via Coupled Initialization - A Worst Case Analysis »
Alexander Munteanu · Simon Omlor · Zhao Song · David Woodruff -
2022 Poster: Online Active Regression »
Cheng Chen · Yi Li · Yiming Sun -
2022 Oral: Online Active Regression »
Cheng Chen · Yi Li · Yiming Sun -
2021 Poster: Fast Sketching of Polynomial Kernels of Polynomial Degree »
Zhao Song · David Woodruff · Zheng Yu · Lichen Zhang -
2021 Poster: Dimensionality Reduction for the Sum-of-Distances Metric »
Zhili Feng · Praneeth Kacham · David Woodruff -
2021 Poster: Streaming and Distributed Algorithms for Robust Column Subset Selection »
Shuli Jiang · Dongyu Li · Irene Mengze Li · Arvind Mahankali · David Woodruff -
2021 Spotlight: Streaming and Distributed Algorithms for Robust Column Subset Selection »
Shuli Jiang · Dongyu Li · Irene Mengze Li · Arvind Mahankali · David Woodruff -
2021 Spotlight: Fast Sketching of Polynomial Kernels of Polynomial Degree »
Zhao Song · David Woodruff · Zheng Yu · Lichen Zhang -
2021 Oral: Dimensionality Reduction for the Sum-of-Distances Metric »
Zhili Feng · Praneeth Kacham · David Woodruff -
2021 Poster: In-Database Regression in Input Sparsity Time »
Rajesh Jayaram · Alireza Samadian · David Woodruff · Peng Ye -
2021 Poster: Oblivious Sketching for Logistic Regression »
Alexander Munteanu · Simon Omlor · David Woodruff -
2021 Spotlight: Oblivious Sketching for Logistic Regression »
Alexander Munteanu · Simon Omlor · David Woodruff -
2021 Spotlight: In-Database Regression in Input Sparsity Time »
Rajesh Jayaram · Alireza Samadian · David Woodruff · Peng Ye -
2020 Poster: Input-Sparsity Low Rank Approximation in Schatten Norm »
Yi Li · David Woodruff -
2020 Poster: Nearly Linear Row Sampling Algorithm for Quantile Regression »
Yi Li · Ruosong Wang · Lin Yang · Hanrui Zhang