We sincerely thank all the reviewers for their insightful comments. We incorporated all specific comments into our current draft. We enclose our responses below.$ Assigned_Reviewer_1: Thank you very much for the comments. Regarding Corollary 4.1: Term A is related to the “balancedness” property. The goal of multiplying by HR in the preprocessing step (H is the Hadamard matrix and R is the random diagonal matrix) is to make each input vector balanced, i.e. to spread out the mass of the vector across all the dimensions in the approximately uniform way. This property is required to obtain theoretical results in the structured setting (it was unnecessary in the unstructured setting) and does not depend on the number of projected dimensions. Regarding nonlinearity: The nonlinearity (sign function) is important to preserve approximate information about the angle between vectors, while filtering out the information about their lengths. Assigned_Reviewer_2: Thank you very much for the comments. The reviewer is right. The slower rate mentioned by the reviewer and the decay that is not linear is a consequence of the structured setting, where the quality of the nonlinear embedding is slightly worse because of the dependencies between entries of the structured matrix. Assigned_Reviewer_3: Thank you very much for the comments. We will try to improve the clarity of the work further even though the clarity was found to be excellent and easy to follow by both Assigned_Reviewer_1 and Assigned_Reviewer_2. Regarding projections: Linear projections need to accompany nonlinear mapping, otherwise angular distance is not preserved. Furthermore, linear projections are typically responsible for dimensionality reduction (since the number of rows of the projection matrix is smaller than the number of columns). Finally, linear projection mechanism can be designed as a structured one. That provides computational speed-ups, memory savings (standard FFT mechanism for fast structured matrix – vector multiplication and clearly sub-quadratic or even linear space for matrices storage), and reduction in randomness usage (i.e. number of required random values). Nonlinear part of the computation cannot be designed in the structured way. Regarding Definition 3.1: We meant *one* left/right shift (standard notation in the literature). Regarding Remark 3.1: They are not the same. A simple 2x2 example: (g1 g2;g3 g1) This is a matrix where the first row is (g1 g2) and the second row is (g3 g1). Note that this matrix is constant along each descending diagonal from left to right and all three descending diagonals are independent if g1, g2, and g3 are independent (thus it is Toeplitz). However the second row is not obtained from the first one by a left/right shift if g2 \neq g3 (thus it is not Circulant). Regarding Remark 3.2: Yes, the reviewer understood it right, we need to be consistent (either all left shifts or all right shifts). Regarding experiments: The experimental section takes two pages of the main body of the paper (plus additional results in the supplementary material) and contains a detailed discussion. The major take-home message that is non-trivial is that all the considered structured matrices achieve reasonable performance in comparison to fully random matrices. Specifically we show: a) the dependence of the performance on the size of the hash and the size of the reduction for different structured matrices and b) the performance of different structured matrices not only when used with neural networks but also with 1-NN classifier. We have a thorough and long discussion of the experimental results in the experimental section, starting from line 758 and continuing to the end of the section. Experiments confirm our novel theoretical results. These were also judged as excellent and offering a substantial and novel contribution by Assigned_Reviewer_1.