Paper ID: 654 Title: On the Statistical Limits of Convex Relaxations Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper follows in the line of important recent work that seeks to understand statistical limits of computationally tractable procedures — that is, lower bounds over all possible computationally efficient algorithms. This is in contrast to upper bounds that are given by analyzing a specific algorithm, or algorithm and complexity independent lower bounds, typically obtained through information-theoretic reasoning. The present paper focuses on understanding limits of performance of a class of hierarchies of relaxations given by SDP — specifically, the sum-of-squares hierarchy. Every fixed level of this hierarchy can be solved by an SDP. While this hierarchy is rich enough to express combinatorial problems, and hence can solve the problems under consideration exactly, this is only known to be possible at the later levels of the hierarchy. This paper considers only fixed levels of the hierarchy (as the dimension of the problem grows). By direct construction — namely, by constructing suboptimal solutions to the original problem that are feasible for the l^th level of the SoS hierarchy, they show for two specific problems (sparse principal sub matrix estimation, and stochastic block model estimation), that the optimal solution obtained from the SoS relaxations is bounded away, in quality, from the information theoretic solution. They also demonstrate that an intractable estimator is able to attain the bound given by the information theoretic analysis. Clarity - Justification: The paper is quite technical, but the authors do a good job of explaining the key technical ideas, the relationship to past work, and the importance. Significance - Justification: Understanding the statistical limits of computationally efficient algorithms is an important frontier. This paper makes a significant step towards aiding our understanding of this, by examining a powerful class of relaxations. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This is an interesting addition to the line of work mentioned. The constructive nature of the argument is additionally interesting. The fixed l condition is perhaps somewhat limiting, but the main results are compelling. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): For two specific problems, the paper shows information-theoretic lower bounds and a method based on convex relaxations which upper bounds match the lower bounds. Clarity - Justification: Text and proofs are clear and easy to follow. One thing I really dislike is the fact the title is somewhat misleading. (I was expecting a far more general result. Instead, the authors analyze two specific problems.) Significance - Justification: The contribution seems to be novel, in the context of the two specific problems being analyzed, but not in general as trying to understand the "statistical limits of convex relaxation". The paper is somewhat interesting, but the title is misleading (I was expecting a far more general result. Instead, the authors analyze two specific problems.) Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The main message of the paper is "we applied SoS to two particular problems, and showed that the upper bounds for an intractable estimator match the information-theoretic lower bounds, but this is not the case for a tractable convex relaxation". ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper addresses a very interesting topic. While convex relaxations have received a lot of attention for tackling nonconvex problems, the paper claims that the computational tractability obtained via such relaxations are in trade-off with statistical optimality. The paper justifies this claim within two specific application problems: 1. Sparse principal submatrix estimation and 2. Edge probability estimation for stochastic block model. For convex relaxation, the paper uses hierarchical of "sum of squares" relaxations. The developed theory confirms that, in a certain regime, there exists a gap between the convergence rate obtained from information theoretic (conceptual) and the statistical (practical) settings. The gap worsens as the order of SOS relaxation lowers. The focus of the paper is on theory. Clarity - Justification: 1. Some notations are undefined. - what [.] means, e.g. [d] in line 152. - what is the meaning of \psi_2 in the norm (line 163)? 2. Section 5 (proofs) is too much of mathematical clutter which makes the paper hard to read. Perhaps the paper could give a high level sketch of the proofs and use some illustrative graphs and figures to communicate the main ideas of the proof sketch. Detailed proofs could be moved to the supplementary appendix. Significance - Justification: The topic that the paper addresses is of great significance, specially at the current time that learning from high-dimensional data combined with sophisticated nonconvex models are becoming prevalent. In such setting, convex relaxations have shown to be very promising, but their limitations are underrated or less understood. I believe understanding the limits of convex relaxations is a critical stepping stone toward improved learning algorithms. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): 1. I wonder why the numerical results are only performed on principal submatrix estimation task. This experiments only could verify the theorems in Section 4.1, and there is no empirical evidence to support the theorems of Section 4.2. 2. Is SoS the best convex relaxation for the applications tasks studied in the paper? If not, then SoS becomes less relevant as the basis of the arguments in the paper. If yes, is there a convincing justification? I see in Line 359 authors refer to [25] to aruge that SoS relaxation works better that Sherali's or Lovasz's. However, this does not imply that there are no other convex relaxations that could do better than SoS relaxation. Please clarify! =====