We thank all the reviewers for the detailed and insightful comments!$ ## Reviewer_1 Thanks for pointing out that a comparison to the linear case may be helpful. We have done these experiments. In our experiments, the true features were recovered after seeing around 200 examples. We will add this to our results, if accepted. Note that the number of examples needed is a function of various factors, such as the complexity of the target, the noise rate and the dimensionality. Analytical finite sample results along this direction for different classes of the link function are an interesting direction for future work. ## Reviewer_2 Thanks for pointing out the reference (S. Tateishi et al.) which is certainly of interest and nicely complements our work. We will cite this work. However, S. Tateishi et al. assume the nonlinear link function between the observation y and feature X can be linearly expressed by finite number of Gaussian basis vectors, which may limit the space of the link functions. We do not impose such assumptions in our work. Also, our work is more focused on theoretical justifications. Extending the work to non-Gaussian error/data are very interesting directions. Our experiments suggest that the theory may be extendable to other distributions such as uniform distribution. However, theoretical justification will involve non-trivial extensions of foundational results, including Stein’s Lemma. We will pursue this in future work. ## Reviewer_3 Thanks for pointing out the need for experiments with group lasso. We have done experiments for a variety of link functions with the group lasso, similar to Figure 2 for the standard lasso. The results are consistent with the theory. We will add these to the paper if accepted. We thank all the reviewers again for the suggestions and the positive feedback!