We would like to thank the reviewers for their insightful feedbacks and their relevant comments. The following remarks address their main concerns/questions:$ * Consistency of the definitions (reviewer_1) – the set of polynomial ranking rules, Definition 3, is defined by R_{P, N}={ r : (x,x') -> \sgn( P(x) -P(x') ) | P is a polynomial function of degree N}. This set contains the ranking rules that are not polynomial functions themselves (since they take values in {-1, 0, 1}), but, are (only) induced by polynomial functions (see line 205). It generalizes to the set of continuous rankings R_{\infty}, i.e., those ranking rules are not continuous (they take values in {-1, 0, 1}), but, they are induced by continuous functions. * Technical assumptions (reviewer_5) – the function f is implicitly assumed to be bounded and the convexity of the search space is mainly needed for the proofs, and not for practical reasons. Moreover, at line 512, we also assume that the ranking rule induced by the unknown function belongs to one of the ranking structures of the sequence {R_n}_{n \in N}, which is quite weak since the sequence is assumed to be infinite. Morse Theory is connected to our approach, but this relationship is limited since we're more concerned about volumes and not topology. The references from Bayesian optimization and global optimization are indeed related to our work, we shall include those citations in the final version and clarify the differences to our setup. * Aggressive strategy (reviewer_6) – we truly believe that considering a more aggressive strategy (e.g. a bayesian strategy similar to the one of SVM active learning, Tong, 2002) would lead to better empirical performances. However, the main purpose of the paper was to introduce the concepts of rankings in non-smooth global optimization to derive a tractable analysis, which is harder to get with more aggressive procedures. Moreover, due to the few number of parameters required by the algorithm, it can be applied to a wide variety of problems, including optimization of weights in deep learning, and the choice of the sequence {R_n}_{n \in N} depends on the problem but we strongly advise to use polynomial rankings for their generic nature. * Experiments – the algorithm was first designed in order to address modern issues in machine learning, like hyper parameters calibration of neural nets or cross-validations. However, due to the wide variety of problems as well as the problem of finding (and tuning) other algorithms (e.g. Bayesian algorithms often require the knowledge of a kernel which strongly influences its behavior), an exhaustive comparison of algorithms requires a full investigation (see Bandits attack function optimization, Preux 2014 for a recent paper entirely dedicated to comparisons). The main purpose of this section is to show that the approach we consider can lead to very competitive algorithms in various cases, as a first study. * Implementation and significance in practice – A key feature of the method we propose is that it does not explicitly need to maintain the set of consistent ranking rules (defined in section 3.1). Indeed, the trick we use (detailed in Section 5) was first proposed in active learning for the same reasons (see Dasgupta et al. 2011 for instance). More precisely, there exists a ranking rule that is consistent with the data if and only if a data-dependent polyhedron is empty (Corollary 2). One can then use the following rejection method (details can be found in Section 5) to generate a query: 1 - sample a point uniformly over the search space X, 2 - create the data-dependent polyhedron defined in Corollary 2, 3 - check if the polyhedron is empty by using the simplex algorithm (which approximately takes 5*10^-2 s to perform on a laptop with one core). Hence the algorithm can run without explicitly maintaining the set of consistent ranking rules. * Numerical complexity – as indicated above, the complexity of generating X_{n+1} is driven, in expectation, by the complexity of Linear Programming times the ratio of the volumes of the search area and the sampling area.