Timezone: »
Poster
CountSketches, Feature Hashing and the Median of Three
Kasper Green Larsen · Rasmus Pagh · Jakub Tětek
In this paper, we revisit the classic CountSketch method, which is a sparse, random projection that transforms a (highdimensional) Euclidean vector $v$ to a vector of dimension $(2t1) s$, where $t, s > 0$ are integer parameters. It is known that a CountSketch allows estimating coordinates of $v$ with variance bounded by $\v\_2^2/s$. For $t > 1$, the estimator takes the median of $2t1$ independent estimates, and the probability that the estimate is off by more than $2 \v\_2/\sqrt{s}$ is exponentially small in $t$. This suggests choosing $t$ to be logarithmic in a desired inverse failure probability. However, implementations of CountSketch often use a small, constant $t$. Previous work only predicts a constant factor improvement in this setting. Our main contribution is a new analysis of CountSketch, showing an improvement in variance to $O(\min\{\v\_1^2/s^2,\v\_2^2/s\})$ when $t > 1$.
That is, the variance decreases proportionally to $s^{2}$, asymptotically for large enough $s$.
Author Information
Kasper Green Larsen (Aarhus University, MADALGO)
Rasmus Pagh (University of Copenhagen)
Jakub Tětek (University of Copenhagen)
Related Events (a corresponding poster, oral, or spotlight)

2021 Spotlight: CountSketches, Feature Hashing and the Median of Three »
Fri Jul 23rd 01:25  01:30 AM Room None
More from the Same Authors

2021 : Differentially private sparse vectors with low error, optimal space, and fast access »
Martin Aumüller · Christian Lebeda · Rasmus Pagh 
2021 Poster: Differentially Private Aggregation in the Shuffle Model: Almost Central Accuracy in Almost a Single Message »
Badih Ghazi · Ravi Kumar · Pasin Manurangsi · Rasmus Pagh · Amer Sinha 
2021 Spotlight: Differentially Private Aggregation in the Shuffle Model: Almost Central Accuracy in Almost a Single Message »
Badih Ghazi · Ravi Kumar · Pasin Manurangsi · Rasmus Pagh · Amer Sinha 
2020 Poster: NearTight MarginBased Generalization Bounds for Support Vector Machines »
Allan Grønlund · Lior Kamma · Kasper Green Larsen 
2019 Poster: Optimal Minimal Margin Maximization with Boosting »
Alexander Mathiasen · Kasper Green Larsen · Allan Grønlund 
2019 Oral: Optimal Minimal Margin Maximization with Boosting »
Alexander Mathiasen · Kasper Green Larsen · Allan Grønlund