Timezone: »
An important question that arises in the study of high dimensional vector representations learned from data is: given a set D of vectors and a query q, estimate the number of points within a specified distance threshold of q. Our algorithm uses locality sensitive hashing to preprocess the data to accurately and efficiently estimate the answers to such questions via an unbiased estimator that uses importance sampling. A key innovation is the ability to maintain a small number of hash tables via preprocessing data structures and algorithms that sample from multiple buckets in each hash table. We give bounds on the space requirements and query complexity of our scheme, and demonstrate the effectiveness of our algorithm by experiments on a standard word embedding dataset.
Author Information
Xian Wu (Stanford University)
Moses Charikar (Stanford University)
Vishnu Natchu
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Oral: Local Density Estimation in High Dimensions »
Thu Jul 12th 03:40 -- 03:50 PM Room K11
More from the Same Authors
-
2019 Poster: Rehashing Kernel Evaluation in High Dimensions »
Paris Siminelakis · Kexin Rong · Peter Bailis · Moses Charikar · Philip Levis -
2019 Oral: Rehashing Kernel Evaluation in High Dimensions »
Paris Siminelakis · Kexin Rong · Peter Bailis · Moses Charikar · Philip Levis -
2018 Poster: Hierarchical Clustering with Structural Constraints »
Evangelos Chatziafratis · Rad Niazadeh · Moses Charikar -
2018 Oral: Hierarchical Clustering with Structural Constraints »
Evangelos Chatziafratis · Rad Niazadeh · Moses Charikar