Paper ID: 93 Title: On the Consistency of Feature Selection With Lasso for Non-linear Targets Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): In this paper, the authors theoretically analyze the selection consistency of the Lasso when the model is misspecified. The authors prove that under certain conditions, Lasso and group Lasso are still able to recover the correct features under model misspecification. The authors present some numerical experiments which substantiate their theoretical analysis. Clarity - Justification: The paper is well written. Existing results are excellently covered and explained. Notations are well defined. Especially, the theoretical preparations are well presented which makes their main results much easier to follow. Significance - Justification: The paper provides novel contributions to the Lasso area. Unlike previous work, the paper presents the selection consistency of the Lasso in case of nonlinear link functions under specific assumptions. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The claimed results look quite interesting and promising, especially the mu-sign consistency analysis of the Lasso. The extension to the group Lasso is well discussed and presented. I have some small remarks on this paper: The authors introduce the estimation consistency and the selection consistency on Page 2 of Section 2. However, it seems that they only focus on the selection consistency through the entire paper. You may emphasize this point at the beginning of this paper. In the proof of Theorem 3.4, you may remove Stein's lemma (Lemma 3.5). Maybe it is better to explain it in the appendix. On Page 6, the extension of the analysis to the group lasso is not sufficiently discussed. The authors may provide some detail information about the proof of Corollary 3.5.2. In the experiments, the authors illustrate their theoretical results of the Lasso which verify Theorem 3.4. However, there is no experiment for the group Lasso case. Corollary 3.5.2 is not well presented through your experimental setting. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This work investigates the question of whether lasso can provably recover the true set of features even when a linear model is learned on nonlinear targets. The authors find that when the data and link function satisfy certain conditions, Lasso is able to recover the true features. Clarity - Justification: This paper is clearly written and well organized. Significance - Justification: Proving consistency for this setting is an interesting and broadly applicable result. Lasso is currently used in many situations where it is unclear whether the target data is perfectly linear. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): For the numerical experiments, while it is clear that as the number of samples increases, the probability of successful selection converges to 1, it would be useful to compare the curves for nonlinear cases with the curve for the linear case. For nonlinear cases, it seems that well over 1000 samples are required for convergence. However, in many cases where lasso is being used, this is not practical. It would be useful to show how many samples the linear case takes to converge on the same plots. If the convergence rate given number of samples is significantly different, it would have practical implications. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The popular Lasso method (Tibshirani,1996) is used in this work for feature selection. This can be related to the work of Chen et al (1998) in signal processing context or more closely related to a series of works starting with Knight and Fu (2000) which tried to proof the consistency of the method and then to extend it to feature selection (Lee et al, 2015). However, and contrary to previous works, the goal of this work is to apply Lasso to data which relationship with the covariates is not directly linear. Clarity - Justification: The paper is easy to follow and understand and no major requirement should be made in this sense. Significance - Justification: Despite it is an interesting work, nicely written and mathematically complete, I do not really think that it supposes a break point in its line of research. In fact, it just extend the scenarios in which Lasso can be applied to include those in which the response is not linearly related to the covariates. What happens if the data has non gaussian errors? Is Lasso still applicable? Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I found this work to be interesting, well written and easy to follow. Trying to find solutions to non-linear relationships in the data is always a difficult task and every stone helps. Still This has been done previously in other contexts (see Nonlinear regression modeling via the lasso-type regularization by S. Tateishi, H. Matsui & S. Konishi in Journal of Statistical Planning and inference, 2010). In my opinion, something that can be quite more relevant is to think in the application of Lasso type methods for selecting features in the context of non gaussian data. =====