Paper ID: 1125 Title: Faster Convex Optimization: Simulated Annealing with an Efficient Universal Barrier Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper simultaneously considers two fundamental problems in convex optimization: (1) minimizing a convex function given an evaluation oracle (or equivalently minimizing a linear function over a convex set given an evaluation oracle) and (2) design an interior point method using an efficiently computable universal barrier for any convex set. In particular they show an equivalence between the path followed by state-of-the-art algorithms for simulated annealing for convex function minimization and path following interior point methods using the entropic barrier which was recently shown to be universal by Bubeck and Eldan. As such, they both characterize cases where the running time of simulated annealing can be improved and address an open question regarding efficiently computing universal barriers. Clarity - Justification: This paper is well written and a pleasure to read. The main ideas come across quite clearly and the authors do a very good job of clarifying the relationship between two possibly complex areas of continuous optimization research, namely provable guarantees for simulated annealing and interior point methods. As such, I believe much of this paper is accessible even without a previous deep understanding of the continuous optimization techniques used. Significance - Justification: This paper makes an important connection between random walk based methods for simulated annealing and path following methods for linear programming. They show that certain random walk based methods for convex function minimization are closely related to a path following interior point method with the entropic universal barrier function. By making this connection they both improve the running time of simulated annealing for convex function of minimization for many convex functions as well as provide the first complete rigorous analysis of a polynomial time path following interior point method using a universal barrier function, thereby making progress on two fundamental open problems in convex optimization. While the results presented in this paper are important to the field of convex optimization, the significance of the running time improvements made, the analysis needed to make their connection, and the the novelty of the connection between interior point and simulated annealing, is debatable. Their algorithm yields a running time improvement only when the convex function being minimized admits certain structure that is difficult to characterize in general. Moreover, much of the analysis needed and used was either standard in the literature on simulated annealing, sampling, interior point methods, or discussed directly in the paper of Bubeck and Eldan which proved that the the entropic barrier was universal. Furthermore, the previous work of Bubeck and Eldan had even mentioned the fact that sampling techniques such as the ones used in the submitted paper could be used to make working with entropic barrier computationally efficient. Nevertheless, this paper does draw an even tighter connection between the sampling techniques and an interior point method with the entropic barrier then was shown in Bubeck and Eldan and thereby does obtain running time improvements for fundamental optimization problems in several regimes. As such this paper does constitute an advance in our understanding of both simulated annealing and the universal barrier and by providing such a strong and clean rigorous analysis, I have hope that this result may be used as a stepping stone for more work in the area. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Below is one more significant comment: - Line 27: Why doesn't using Lovasz Vempala '07 as suggested by Sebastien Bubeck and Ronen Eldan to sample from the entropic barrier distribution already give an efficiently computable universal barrier? Below are multiple more minor and specific comments, questions, and suggestions: - Line 27: Since the word is used multiple times, maybe be a little more clear what "efficient" means here and throughout. - Line 33: Remove the words "by now" - Line 42: I am not sure Karmarkar developed the first interior point method. I believe he gave the first proof of a polynomial time interior point method, but I think they may have existed before. - Line 110: Isn't it a little more than "first-order"? - Line 123: The self-concordance parameter v here depends on the convex set, right? Perhaps be clearer about that. - Line 143: For this problem to be tractable, don't you need an initial point inside the convex set? Or maybe even a ball in the set? - Line 176: I would be clear about what norm this is. 2-norm? - Line 446: Why isn't continuity more or less immediate? - Line 512: It seems odd to come up with a name for "iterative Newton algorithm." This algorithm is exactly the standard primal path following interior point method, right? ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper shows a form of equivalence between simulated annealing algorithms and interior point algorithms for convex optimization. More precisely, for a linear optimization problem over a convex body, the authors show that the curve traced by the mean point of the Boltzmann distribution equals the curved traced central path for the entropic barrier. Clarity - Justification: Given the space restrictions, the paper does a good job at concisely surveying both simulated annealing and interior point methods. I wish some of these background sections, and in particular those not strictly necessary to understand Propositon 1, had been left out of the main body of the paper. As it is, the new technical content is limited to the last two pages. The supplementary material is somewhat helpful, but less well-organized and more terse in its technical parts. Significance - Justification: Despite the lack of a clear application to ML in this paper, I find the connection between interior point methods, the entropic barrier (and its dual, the differential entropy) and simulated annealing very simple and elegant. I believe that this interpretation may lead to new ways of thinking about interior point methods and about simulated annealing (and random walks over convex bodies in general). In turn, these future results may be relevant to machine learning. For this reason, I think that the paper is significant for the ML community. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The main technical contribution, an improved cooling schedule for simulated annealing with running time dependent on the barrier parameter, is an interesting result in theory of convex optimization. The relevance of this result to Machine Learning is not immediately clear to me. The authors emphasize how the new schedule leads to a speed up in the running time of simulated annealing over the SDP cone, but fail to explain why we would want to use simulated annealing on the SDP cone, given that we can already apply the faster interior point methods. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper signifies the equivalence of two approaches for convex optimization: (i) the celebrated interior point methods and, (ii) a specialized random-walk-based method for convex optimization: the simulated-annealing algorithm of Kalai & Vempala (2006). In particular, 1. The authors define the heat path, i.e., the curve formed by the means of the annealing distributions Pθ, as the temperature parameter is updated, and show its equivalence with the central path of interior point methods, where the entropic universal barrier function is used. 2. By such equivalence, one can use the advantages of each approach in improving the other. In particular, • Interior point methods have been mostly defined and used over convex sets with computable barriers. With this equivalence and given that random-walk based approaches apply also on problems with membership-oracle only “capabilities”, one can devise a randomized interior point method for any convex set, described by a membership oracle. • Interior point methods have a well-characterized update rule for the central path parameter t. By the above equivalence, the authors propose a new temperature schedule that improves the resulting running time of simulated annealing algorithms (i.e., in particular, one can substitute n with ν in the update rule, which might lead to improvements, depending on the convex set K). Clarity - Justification: The authors do a great job in describing succinctly the two “worlds” of interior point methods and simulated annealing algorithms. For each case, the authors clearly define the basic notions and how these lead to the equivalence between the two families of algorithms. Some suggestions regarding clarity are the following: • Section 1.1 contains information that is not used in the main text or, at least, appears at the very end. Maybe the authors could consider moving (and reducing) this section to save some space and help the flow of the paper. • It turns out that the contributions/new results in the main text fit into 1.5 - 2 pages. Moreover, some new results are presented in the appendix of the paper. There is a trade-off here: while one could say that moving new results in the main text would be preferable (“sacrificing” details in prior work), others would prefer this—more pedagogical—presentation of ideas. Bottom-line, the authors might consider moving some more technical parts in the main text. Significance - Justification: The paper is of great value as it conceptually brings together two seemingly different methods. To the reviewer’s humble opinion, key results/observations are the following: Given the convex conjugate of the log partition function of the exponential family A^∗(·), Proposition 1 first connects the mean of the distribution P_{θ}(·) with the interior point method, regularized by A^∗_{−}(·). Then, using the result by Bubeck & Eldan, one observes that A^∗_{−}(·) is a ν-self concordant barrier function. These two observations are key and definitely non-trivial that lead to paper’s main result. Having this result as a given, one can then consider techniques used in one or the other research area, in order to improve state-of-the-art ideas (this is at some level done in this paper). For this simple reason, I believe that this paper may trigger some further developments in these areas. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The following comments are not in priority order, but rather in top-to-bottom order, as ones reads the paper. 1. There are several typos in the text to be corrected (already in the abstract, there are two typos). Please make a careful pass for the final version. 2. First appearance of K is in Page 1, lines 76-77. Please introduce what K is. 3. One of the contributions of the paper is that, due to the connection with the simulated annealing approach, one can solve problems where only a membership oracle is available. It would be informative though to describe practical cases where only such an oracle is available. Any applications where the convex constraint problem of the problem is described by a membership oracle only? 4. Page 2, lines 123-125 and 130-140: the authors claim that the new temperature schedule for simulated annealing has running time O(\sqrt{ν}n^4), instead of O(n^{4.5}). That is, the two complexities differ by a factor O(\sqrt{\nu}{n}) 􏰁. By the result of Bubeck & Eldan (2014), ν ≤ n · (1 + o(1)), which implies that in the worst case ν O(\sqrt{\nu}{n}) = O(1), and the two complexities are “equivalent” up to constants. However, in many cases as ν the authors say, this factor is o(1): could you please enumerate 2-3 cases that this happens? Please insert a subsection describing these cases and put a reference pointing to that subsection. Moreover, the example in lines 137-140 is misleading: while the authors want to indicate that the entropic barrier has a better ν parameter in many cases, the example given still results in O(1) factors between the two algorithms, which “cancels” the argument in lines 134-136. 5. Page 3, line 255: “maximizing” or “minimizing” point x^∗? 6. Page 3, line 263: is t very close to 0? The intuition says that t starts away from zero and, via temperate updates, decreases to 0? Should t be from the beginning very close to 0 so that simulated annealing works? Please justify because this point is not clear to the reviewer. 7. There is no reference of Figure 1 in the text - also, the caption of Figure 2 should describe more of what is happening in the figures (what are the axes, what is the difference between subplots, etc). 8. Page 5, line 537: “maximizer” or “minimizer” of Φ_k? 9. Page 6, footnote lines 656-659: the authors could also note that finding such an initial point still satisfies the worst-case complexity, i.e., path-following methods have two phases: Phase I is the phase that gets us to the initial point \hat{x}_0 (using damped-Newton methods), Phase II is the one-Newton step + t update. 10. Page 6, lines 623: it might be useful to define what an ε-approximation solution is. Is it \hat{\theta}^\top \hat{x} − min_{x \in K} \hat{\theta}^\top x ≤ ε for \hat{x} being the ε-approximate solution? 11. Page 7, start of Section 4: this is a repetition of the main contributions in the Introduction. The authors could remove this part and insert some more interesting parts from the appendix. 12. Page 7, line 715: It would be helpful to show this or point to a specific Lemma in Rockafellar’s book. 13. Lemma 4 has no interpretation. Can you please elaborate on its significance? 14. Proof of Lemma 5 should be in the main text. Moreover, it is not apparent how the first equality in line 1119 holds. =====