Thanks to all reviewers for the comments (Review#1,R#5,R#6).$
It seems that R#5 missed a central point of our work - we focus our rebuttal on this.

Regarding R#1,R#6:

- Thanks for references about Lipschitz -- will compare in the intro. 
Note that in terms of lower bounds Lipschitz and bi-Lipschitz are not the same (also explained in R#1). There is a fundamental conceptual difference (albeit, an easy-to-miss one). An 1-Lipschitz map in general isn't learnable. However, an 1-bi-Lipschitz can be learned with only log(n) queries. So, for 1-bi-Lip no lower bound exists, while for (1+eps)-bi-Lip we have (technically involved) lower bounds for natural learning settings. 

One way to think about it:  1-bi-Lip = very strong metric structure;   (1+eps)-bi-Lip = strong metric structure.

- Yes, it's open for exactly zero generalization error (we'll mention this in the text).

- Both R#1,R#6 hinted that we should add something about the experts in our experiments (for now, the experiments Google/Bing are only in the appendix). We'll briefly explain in the main text.


R#5 is the main issue:

- The review raises an issue about regression problems (i.e. of continuous nature). This issue is not directly relevant to our work and it is a technical misconception (see below). Unfortunately, the negative recommendation of the review is based on this misconception. As the review suggested, we will explain in this response.

Here is a more general geometric background (might clear things up in R#5, see l172-182 for a succinct explanation):

A continuous e.g. (R^n, with \ell_2 distance) metric space is *not* a generalization, as R#5 claims, (and in no technical sense) of the considered structured discrete spaces. Lower bounds for ell2 do not translate to (non-trivial) lower bounds for Hamming or Edit. Technically, this is impossible for every (i.e. including very complicated) translations that map for example ell2 to Hamming. More precisely, *every* discretization of R^n and under *every* embedding into ({0,1}^n, ell1) suffers distortion Omega(sqrt(logn)). See Matousek's text (cited) p.380-382, Thm 15.4.1. This is proved using the "short diagonals lemma", commonly used to study relations between discrete and continuous geometry. Furthermore, there is no hope even for local embeddings [Arora-Lovasz-Newman-Rabani-Rabinovich-Vempala'06]. Note that these technical limitations are there even before proving the lower bound itself.

Thus, at a pure technical level our results were not known before. Also, we do not think that our (25-page-long) arguments are obvious.

Less technically, by analogy consider the comparison between regression and structured prediction problems.

To build further geometric intuition observe that the hamming cube is a good expander: every (adversarially chosen) subset of vertices (of size 2^Omega(n)) has large neighborhood. The mappings we wish to learn map one copy of the hamming cube to another. Now, if such a mapping almost preserves distances then by querying a small number of points we have a good estimate for the entire neighborhood. In fact, because of expansion this is true for all neighborhoods simultaneously. Thus, the expansion feature poses technical challenges in getting a lower bound. One might expect that we can use the bi-Lipschitz property and learn the map by querying only few points. Indeed, this is true when the bi-Lipschitz constant is 1. The interesting part is that when the constant is relaxed tiny above 1, then this intuition breaks down (and strong lower bounds emerge).


- About "new tools": see e.g., the algorithm that does the probabilistic local embedding in Lemma 10 of the appendix. We use this algorithm for a local reduction argument. Maybe this algorithm can find NLP applications (e.g. clustering, word alignment etc) although we do not have a concrete development yet.


- We can change the title by adding the term "Limits" or "Limitations".


- About l83-86: although the space has exponentially many points it can still be learned with poly-many samples and this is because of the very strong (i.e. not free) metric structure.


- About l149: we will rephrase.


- About l580-587: we do not agree with this remark. It seems that the other reviews do not agree as well. We explicitly refer to the N^3 integer points on the lateral surface of size N^2 x N.