We thank the reviewers for the comments, corrections and suggestions. We will make these changes in the final version of the paper.$ Response to Assigned reviewer 4: "What is the motivation for selecting the objective function which maximizes ||\phi(F) - \phi(H)||?" From convexity of $d$, and proposition 4, we have that ||\phi(F) - \phi(H)|| is the slope at infinity for $d$, and essentially forms the "roof" for the ideal step function shaped curve for $\nabla d$. Intuitively it is best if the roof is as high as possible for the gradient thresholding strategy. Response to Assigned reviewer 5: 1. "The analysis does not utilize the uniform bound. Since we need to optimize lambda, the difference between the population and empirical C-distance should be bounded "uniformly" over all lambda." The bounds indeed are of the uniform type in lemmas 6 and 7. Both hold simultaneously for all specified lambda, under the event E_delta, which is a fixed event that does not depend on lambda. There is a dependence on lambda star in the bounds, but it does not affect the validity of our results, as it is taken to be a fixed but unknown constant. 2. "The convergence rates for the value and gradient thresholding ... " Since the submission, we have discovered an alternative way of setting the thresholds in the value thresholding estimator that achieves the same 1/sqrt(m) rate. But the method would be very sensitive to the choice of this threshold practically, as can be inferred from figure 2. 3. "In the numerical experiments, the two methods KM_1 and KM_2 show different performance. It would be helpful if there would be explanation about why this difference occurs." The difference is essentially due to the differring threshold strategy. The threshold for KM1 ( 1/sqrt(m) ) is mostly smaller than KM2 ( of the form c + c'/sqrt(m) ) and hence most KM1 errors are underestimates and most KM2 errors are overestimates.