# approximate nearest neighbor

• Aryeh Kontorovich and Roi Weiss

### Maximum Margin Multiclass Nearest Neighbors (pdf)

We develop a general framework for margin-based multicategory classification in metric spaces. The basic work-horse is a margin-regularized version of the nearest-neighbor classifier. We prove generalization bounds that match the state of the art in sample size $n$ and significantly improve the dependence on the number of classes $k$. Our point of departure is a nearly Bayes-optimal finite-sample risk bound independent of $k$. Although $k$-free, this bound is unregularized and non-adaptive, which motivates our main result: Rademacher and scale-sensitive margin bounds with a logarithmic dependence on $k$. As the best previous risk estimates in this setting were of order $\sqrt k$, our bound is exponentially sharper. From the algorithmic standpoint, in doubling metric spaces our classifier may be trained on $n$ examples in $O(n^2\log n)$ time and evaluated on new points in $O(\log n)$ time.

• Jacob Gardner and Matt Kusner and Zhixiang and Kilian Weinberger and John Cunningham

### Bayesian Optimization with Inequality Constraints (pdf)

Bayesian optimization is a powerful framework for minimizing expensive objective functions while using very few function evaluations. It has been successfully applied to a variety of problems, including hyperparameter tuning and experimental design. However, this framework has not been extended to the inequality-constrained optimization setting, particularly the setting in which evaluating feasibility is just as expensive as evaluating the objective. Here we present constrained Bayesian optimization, which places a prior distribution on both the objective and the constraint functions. We evaluate our method on simulated and real data, demonstrating that constrained Bayesian optimization can quickly find optimal and feasible points, even when small feasible regions cause standard methods to fail.

• Anoop Cherian

### Nearest Neighbors Using Compact Sparse Codes (pdf)

In this paper, we propose a novel scheme for approximate nearest neighbor (ANN) retrieval based on dictionary learning and sparse coding. Our key innovation is to build compact codes, dubbed SpANN codes, using the active set of sparse coded data. These codes are then used to index an inverted file table for fast retrieval. The active sets are often found to be sensitive to small differences among data points, resulting in only near duplicate retrieval. We show that this sensitivity is related to the coherence of the dictionary; small coherence resulting in better retrieval. To this end, we propose a novel dictionary learning formulation with incoherence constraints and an efficient method to solve it. Experiments are conducted on two state-of-the-art computer vision datasets with 1M data points and show an order of magnitude improvement in retrieval accuracy without sacrificing memory and query time compared to the state-of-the-art methods.

• Ting Zhang and Chao Du and Jingdong Wang

### Composite Quantization for Approximate Nearest Neighbor Search (pdf)

This paper presents a novel compact coding approach, composite quantization, for approximate nearest neighbor search. The idea is to use the composition of several elements selected from the dictionaries to accurately approximate a vector and to represent the vector by a short code composed of the indices of the selected elements. To efficiently compute the approximate distance of a query to a database vector using the short code, we introduce an extra constraint, constant inter-dictionary-element-product, resulting in that approximating the distance only using the distance of the query to each selected element is enough for nearest neighbor search. Experimental comparison with state-of-the-art algorithms over several benchmark datasets demonstrates the efficacy of the proposed approach.

2013-2014 ICML | International Conference on Machine Learning