Timezone: »
Histograms, i.e., piece-wise constant approximations, are a popular tool used to represent data distributions. Traditionally, the difference between the histogram and the underlying distribution (i.e., the approximation error) is measured using the L_p norm, which sums the differences between the two functions over all items in the domain. Although useful in many applications, the drawback of this error measure is that it treats approximation errors of all items in the same way, irrespective of whether the mass of an item is important for the downstream application that uses the approximation. As a result, even relatively simple distributions cannot be approximated by succinct histograms without incurring large error.In this paper, we address this issue by adapting the definition of approximation so that only the errors of the items that belong to the support of the distribution are considered. Under this definition, we develop efficient 1-pass and 2-pass streaming algorithms that compute near-optimal histograms in sub-linear space. We also present lower bounds on the space complexity of this problem. Surprisingly, under this notion of error, there is an exponential gap in the space complexity of 1-pass and 2-pass streaming algorithms. Finally, we demonstrate the utility of our algorithms on a collection of real and synthetic data sets.
Author Information
Justin Chen (MIT)
Piotr Indyk (MIT)
Tal Wagner (MIT)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Streaming Algorithms for Support-Aware Histograms »
Tue. Jul 19th through Wed the 20th Room Hall E #1112
More from the Same Authors
-
2023 Oral: Fast Private Kernel Density Estimation via Locality Sensitive Quantization »
Tal Wagner · Yonatan Naamad · Nina Mishra -
2023 Poster: Fast Private Kernel Density Estimation via Locality Sensitive Quantization »
Tal Wagner · Yonatan Naamad · Nina Mishra -
2023 Poster: Data Structures for Density Estimation »
Anders Aamand · Alexandr Andoni · Justin Chen · Piotr Indyk · Shyam Narayanan · Sandeep Silwal -
2022 Poster: Faster Fundamental Graph Algorithms via Learned Predictions »
Justin Chen · Sandeep Silwal · Ali Vakilian · Fred Zhang -
2022 Spotlight: Faster Fundamental Graph Algorithms via Learned Predictions »
Justin Chen · Sandeep Silwal · Ali Vakilian · Fred Zhang -
2021 Poster: Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering »
Shyam Narayanan · Sandeep Silwal · Piotr Indyk · Or Zamir -
2021 Spotlight: Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering »
Shyam Narayanan · Sandeep Silwal · Piotr Indyk · Or Zamir -
2021 Poster: Faster Kernel Matrix Algebra via Density Estimation »
Arturs Backurs · Piotr Indyk · Cameron Musco · Tal Wagner -
2021 Spotlight: Faster Kernel Matrix Algebra via Density Estimation »
Arturs Backurs · Piotr Indyk · Cameron Musco · Tal Wagner -
2020 Poster: Scalable Nearest Neighbor Search for Optimal Transport »
Arturs Backurs · Yihe Dong · Piotr Indyk · Ilya Razenshteyn · Tal Wagner -
2019 Poster: Composable Core-sets for Determinant Maximization: A Simple Near-Optimal Algorithm »
Sepideh Mahabadi · Piotr Indyk · Shayan Oveis Gharan · Alireza Rezaei -
2019 Poster: Scalable Fair Clustering »
Arturs Backurs · Piotr Indyk · Krzysztof Onak · Baruch Schieber · Ali Vakilian · Tal Wagner -
2019 Oral: Scalable Fair Clustering »
Arturs Backurs · Piotr Indyk · Krzysztof Onak · Baruch Schieber · Ali Vakilian · Tal Wagner -
2019 Oral: Composable Core-sets for Determinant Maximization: A Simple Near-Optimal Algorithm »
Sepideh Mahabadi · Piotr Indyk · Shayan Oveis Gharan · Alireza Rezaei -
2018 Poster: Semi-Supervised Learning on Data Streams via Temporal Label Propagation »
Tal Wagner · Sudipto Guha · Shiva Kasiviswanathan · Nina Mishra -
2018 Oral: Semi-Supervised Learning on Data Streams via Temporal Label Propagation »
Tal Wagner · Sudipto Guha · Shiva Kasiviswanathan · Nina Mishra