We kindly thank the three reviewers for their careful reading of the paper and their very useful comments and suggestions, which we will fully implement should the paper be accepted. $ Two of the reviewers raised specific concerns regarding the correctness of the main results, and asked for clarifications. We provide the required clarifications below, and hope that these will indeed satisfy the reviewers and lead to improved ratings as mentioned in the reviews. Reviewer 1: 1. "The proof of Theorem 1 … Under H_0 and under H_k, the distribution of arm k does not have the same support, which creates an extra difficulty: P_k^H is not absolutely continuous with respect to P_0^H (which has a smaller support for arm k) and the Radon-Nikodym derivative dP_k^H / dP_0^H that appears in the display l.438 cannot be defined. However, I agree (in spirit) that, on the event C_k (where no observation outside the support of P_0 are made), one should be able to write something like that, but it's not clear to me how. " Thank you, this indeed needs to be better clarified and formalized. The R-N derivative is indeed completely defined only for any two probability measures (say Pk and P0) such that Pk < < P0 (i.e., Pk absolutely continuous w.r.t. P0), which is not the case here. We can however define this derivative on any event over which this dominance holds. Let C be an event, let Pk|C and P0|C be the (sub-probability) measures restricted to C (i.e. Pk|C(B)=Pk(C\cap B)), and suppose that Pk|C < < P0|C (which is the case here as the reviewer observed). We can now define the R-N derivative RN(h) for these restricted measures, and in particular obtain {RN(h), h\in C} such that (for any measurable g), \int_C g(h) Pk(dh) = \int_C g(h) RN(h) P0(dh) which is exactly what we use here. In the paper, in addition to adding a verbal explanation, we will modify display l.438 by erasing f_k(h) and adding the qualifier h\in C_k; and add h\in S_k in display l.451. 2. Regarding Theorem 1 (continued): "Also, how to obtain this equation should be clarified. It should be clearly mentioned that f_k(h) is a definition, and that (1-gamma_k/F_k(\bar{\mu}_k) comes from a ratio of densities (do we need to assume that F_k(mu) has a density to write this?)." As mentioned above, the confusing term f_k(h) will be eliminated from this equation. The mentioned ratio is actually a ratio of distributions, so no densities are required. 3. Regarding Theorem 2: “However, I am not convinced that (5) trivially holds for all arms…”: First, there is a typo in line 502, in the definition of N_0, it should be: G_*(\epsilon_0) instead of \epsilon_0. Let us now explain why the condition indeed holds for all arms. The stopping condition in step 5 of the algorithm (l.512) is equivalent to C(k*)> L/G*(\eps). Note that C(k) is incremented by 1 at a time, and only for k=k*. Therefore, no counter can exceed [L/G*(\eps)]+1. 4. Minor comment 1: Yes, in Theorem 3 (as well as Theorem 2) we assume \epsilon<\epsilon_0. We will add that explicitly. Reviewer 3: "In the proof of Theorem 1, lines 473-479, I do not understand how you cope with k'. I think I follow your argument lower bounding E_0[T_k] for all k\ne k'. On the other hand, T is the sum of T_k over all k. Hence, you can lower bound in expectation at least all the summands, T_k, except for T_{k'}. However your sum in the display on line 477 is over all k \ne k*. I do not follow how knowing that t_{k^*} \ge t_{k'} helps you to do this. Can you please explain?" We gladly clarify that argument (here as well as in the paper). We start with E(T) \ge \sum_{k\ne k'}E(T_k) \ge \sum_{k\ne k'}t_k But since t_{k*} \ge t_{k'}, this gives E(T) \ge \sum_{k\ne k'} t_k \ge \sum_{k\ne k*} t_k We can now substitute t_k from l.362 to obtain the inequality in l.477.