Timezone: »
Poster
In-Database Regression in Input Sparsity Time
Rajesh Jayaram · Alireza Samadian · David Woodruff · Peng Ye
Sketching is a powerful dimensionality reduction technique for accelerating algorithms for data analysis. A crucial step in sketching methods is to compute a subspace embedding (SE) for a large matrix $A \in \mathbb{R}^{N \times d}$. SE's are the primary tool for obtaining extremely efficient solutions for many linear-algebraic tasks, such as least squares regression and low rank approximation. Computing an SE often requires an explicit representation of $A$ and running time proportional to the size of $A$. However, if $A= T_1 \Join T_2 \Join \dots \Join T_m$ is the result of a database join query on several smaller tables $T_i \in \mathbb{R}^{n_i \times d_i}$, then this running time can be prohibitive, as $A$ itself can have as many as $O(n_1 n_2 \cdots n_m)$ rows. In this work, we design subspace embeddings for database joins which can be computed significantly faster than computing the join. For the case of a two table join $A = T_1 \Join T_2$ we give input-sparsity algorithms for computing subspace embeddings, with running time bounded by the number of non-zero entries in $T_1,T_2$. This results in input-sparsity time algorithms for high accuracy regression, significantly improving upon the running time of prior FAQ-based methods for regression. We extend our results to arbitrary joins for the ridge regression problem, also considerably improving the running time of prior methods. Empirically, we apply our method to real datasets and show that it is significantly faster than existing algorithms.
Author Information
Rajesh Jayaram (Carnegie Mellon University)
Alireza Samadian (University of Pittsburgh)
David Woodruff (Carnegie Mellon University)
Peng Ye (Tsinghua University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: In-Database Regression in Input Sparsity Time »
Thu. Jul 22nd 01:20 -- 01:25 PM Room
More from the Same Authors
-
2022 Poster: Sketching Algorithms and Lower Bounds for Ridge Regression »
Praneeth Kacham · David Woodruff -
2022 Poster: Learning Augmented Binary Search Trees »
Honghao Lin · Tian Luo · David Woodruff -
2022 Poster: Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time »
David Woodruff · Amir Zandieh -
2022 Spotlight: Sketching Algorithms and Lower Bounds for Ridge Regression »
Praneeth Kacham · David Woodruff -
2022 Spotlight: Learning Augmented Binary Search Trees »
Honghao Lin · Tian Luo · David Woodruff -
2022 Spotlight: Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time »
David Woodruff · Amir Zandieh -
2022 Poster: Bounding the Width of Neural Networks via Coupled Initialization - A Worst Case Analysis »
Alexander Munteanu · Simon Omlor · Zhao Song · David Woodruff -
2022 Spotlight: Bounding the Width of Neural Networks via Coupled Initialization - A Worst Case Analysis »
Alexander Munteanu · Simon Omlor · Zhao Song · David Woodruff -
2021 Poster: Single Pass Entrywise-Transformed Low Rank Approximation »
Yifei Jiang · Yi Li · Yiming Sun · Jiaxin Wang · David Woodruff -
2021 Poster: Fast Sketching of Polynomial Kernels of Polynomial Degree »
Zhao Song · David Woodruff · Zheng Yu · Lichen Zhang -
2021 Poster: Dimensionality Reduction for the Sum-of-Distances Metric »
Zhili Feng · Praneeth Kacham · David Woodruff -
2021 Poster: Streaming and Distributed Algorithms for Robust Column Subset Selection »
Shuli Jiang · Dongyu Li · Irene Mengze Li · Arvind Mahankali · David Woodruff -
2021 Spotlight: Streaming and Distributed Algorithms for Robust Column Subset Selection »
Shuli Jiang · Dongyu Li · Irene Mengze Li · Arvind Mahankali · David Woodruff -
2021 Spotlight: Fast Sketching of Polynomial Kernels of Polynomial Degree »
Zhao Song · David Woodruff · Zheng Yu · Lichen Zhang -
2021 Spotlight: Single Pass Entrywise-Transformed Low Rank Approximation »
Yifei Jiang · Yi Li · Yiming Sun · Jiaxin Wang · David Woodruff -
2021 Oral: Dimensionality Reduction for the Sum-of-Distances Metric »
Zhili Feng · Praneeth Kacham · David Woodruff -
2021 Poster: Oblivious Sketching for Logistic Regression »
Alexander Munteanu · Simon Omlor · David Woodruff -
2021 Spotlight: Oblivious Sketching for Logistic Regression »
Alexander Munteanu · Simon Omlor · David Woodruff -
2020 Poster: Input-Sparsity Low Rank Approximation in Schatten Norm »
Yi Li · David Woodruff