Spotlight
CountSketches, Feature Hashing and the Median of Three
Kasper Green Larsen · Rasmus Pagh · Jakub Tětek
Abstract:
In this paper, we revisit the classic CountSketch method, which is a sparse, random projection that transforms a (high-dimensional) Euclidean vector to a vector of dimension , where are integer parameters. It is known that a CountSketch allows estimating coordinates of with variance bounded by . For , the estimator takes the median of independent estimates, and the probability that the estimate is off by more than is exponentially small in . This suggests choosing to be logarithmic in a desired inverse failure probability. However, implementations of CountSketch often use a small, constant . Previous work only predicts a constant factor improvement in this setting. Our main contribution is a new analysis of CountSketch, showing an improvement in variance to when .
That is, the variance decreases proportionally to , asymptotically for large enough .
Chat is not available.