Skip to yearly menu bar Skip to main content


Spotlight

CountSketches, Feature Hashing and the Median of Three

Kasper Green Larsen · Rasmus Pagh · Jakub Tětek

Abstract: In this paper, we revisit the classic CountSketch method, which is a sparse, random projection that transforms a (high-dimensional) Euclidean vector v to a vector of dimension (2t1)s, where t,s>0 are integer parameters. It is known that a CountSketch allows estimating coordinates of v with variance bounded by v22/s. For t>1, the estimator takes the median of 2t1 independent estimates, and the probability that the estimate is off by more than 2v2/s is exponentially small in t. This suggests choosing t to be logarithmic in a desired inverse failure probability. However, implementations of CountSketch often use a small, constant t. Previous work only predicts a constant factor improvement in this setting. Our main contribution is a new analysis of CountSketch, showing an improvement in variance to O(min{v12/s2,v22/s}) when t>1. That is, the variance decreases proportionally to s2, asymptotically for large enough s.

Chat is not available.