Timezone: »

CountSketches, Feature Hashing and the Median of Three
Kasper Green Larsen · Rasmus Pagh · Jakub Tětek

Thu Jul 22 09:00 PM -- 11:00 PM (PDT) @ None #None
In this paper, we revisit the classic CountSketch method, which is a sparse, random projection that transforms a (high-dimensional) Euclidean vector $v$ to a vector of dimension $(2t-1) s$, where $t, s > 0$ are integer parameters. It is known that a CountSketch allows estimating coordinates of $v$ with variance bounded by $\|v\|_2^2/s$. For $t > 1$, the estimator takes the median of $2t-1$ independent estimates, and the probability that the estimate is off by more than $2 \|v\|_2/\sqrt{s}$ is exponentially small in $t$. This suggests choosing $t$ to be logarithmic in a desired inverse failure probability. However, implementations of CountSketch often use a small, constant $t$. Previous work only predicts a constant factor improvement in this setting. Our main contribution is a new analysis of CountSketch, showing an improvement in variance to $O(\min\{\|v\|_1^2/s^2,\|v\|_2^2/s\})$ when $t > 1$. That is, the variance decreases proportionally to $s^{-2}$, asymptotically for large enough $s$.

Author Information

Kasper Green Larsen (Aarhus University, MADALGO)
Rasmus Pagh (University of Copenhagen)
Jakub Tětek (University of Copenhagen)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors