Paper ID: 1286 Title: Epigraph projections for fast general convex programming Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper considers the displined convex programming. It proposes transforming a particular class of convex optimization problems into a canonical form such that epigraph projection is essential. Some basic solvers are provided for the epigraph projection. Experiments show that the proposed general purpose convex programming software can be orders of magnitudes faster than another general purpose software CVX. Clarity - Justification: The paper is relatively easy to follow, if one is familiar with proximal mapping and ADMM. Theorem 1 is a bit involved, but should not cause much trouble. Significance - Justification: Although a general purpose convex programming solver is an important and useful idea for the machine learning community, this paper does not convince me why the proposed methodology is necessary. Namely, the model problem (5) can be easily solved by ADMM, which is much simpler and does not require the non-intuitive conversion in Theorem 1, why it should be converted to (8) and then solved by epigraph projection, which is much more involved? Below I elaborate a little. First, we add characteristic functions of the inequalities to the objective function to remove the inequality constraints. Then we further introduce an auxiliary variable z and replace Ax=b with Az=b and z=x_i, where x_i is the variable for f_i. If we wish, we can also add a characteristic function to the objective in order to remove the constraint Az=b. Third, we can apply two-block ADMM to solve the equivalent problem, where z is the first macro-block, while y=[x_0^T,x_1^T,...,x_p^T]^T is the second macro-block. Then we can find that we only need to compute the projection of some vector w onto the convex set {x|f_i(x)\leq 0}. This problem is (11) with t=0, which is simpler than the epigraph projection problem as t is KNOWN. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Besides the motivation issue I raised above, the proposed Epsilon system should also be compared with the purely ADMM based system, e.g., the one described above, although it may not exist. I am a bit skeptic about the comparison results. I am not failiar with CVX. Does CVX involve inner loops for solving subproblems? If so, how precise should the subproblems be solved? And for Epsilon, how precise should the subproblems be solved? Most of the cases, even if the subproblems are solved with low precision, the outer loops can still converge well. So if the precisions of the inner loops are not identical or close to each other, I would deem that the comparison is unfair. Minor comments: 1. For the image classification experiments on MNIST, what optimization problem is solved? 2. What does "or" in (7) mean? What does y_{\leq} and y_{\geq} mean (only y_{less}, y_{greater}, and y_= are defined)? 3. There are abundant English errors and typos. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): Authors propose a technique for fast epigraphical projection, which allows first-order algorithms for a large subclass of DCP problems. Clarity - Justification: The paper was very easy to follow and understand. Aside from a few minor comments, listed below, I thought the presentation was very good. Significance - Justification: General-purpose convex solvers that scale are a great contribution. The paper allows solving a range of problems, built with CVX, using first-order methods. If my understanding is correct, the subclass the approach can handle is 'smooth or piecewise linear'. it would be nice to bring this out more clearly in the paper. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Please provide a clear description of the function class the approach can handle. The introduction states that you can do ANY DCP function, however the implicit dual newton appears to require Hessians of $f$, while sum of max requires a piecewise linear objective. What does the approach do with Huber for example? Huber is not on your list. What about all C-1 objectives? I think a clearer explanation can increase the impact of the paper. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a novel solver for a class of so called disciplined convex problems (DCP), i.e. problems which can be defined by a set of atomic convex functions and a set of composition rules. The authors first show that any DCP can be converted to an equivalent minimization problem whose objective is sum of convex functions (which are only the atomic functions or indicator functions) under an affine constraint. Second, the authors show that the equivalent problem can be solved by the Alternating Direction Method of Multipliers (ADMM). The ADMM update approaches to the atomic functions only via two procedure: the proximal projection and the apigraph projection problem. The authors show how to solve the two projection problems for a wide set of atomic functions. The proposed solver is empirically compared with other two generic solvers based on converting the DCP to cone form and solved either via interior point or ADMM. Clarity - Justification: I have no objections. Significance - Justification: A generic convex solver is a great tool for prototyping new (not only) machine learning methods. Existing generic solvers are applicable only for small sized problems. Any progress towards generic solver capable to deal with reasonably large problems is very helpful. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The main problem is the experimental comparison with existing solvers. The authors report the running time needed to achieve certain value of the objective function, however, the level of achieved precision (distance to the optimal value, duality gap etc.) is not defined. Hence the comparison only shows that the proposed solver achieves unknown level of precision much faster than the competing methods. It would be instructive to show the evolution of the objective function (or distance to the optimal value) as a function of running time for couple of problems. It would be also useful to know whether the solver provides any certificate of optimality or what kind of stopping conditions it uses. Typos/errors ------------ - line 144: "which must is itself a linear..." - line 259: missing bar over $A$ and $b$ - line 319: "(see. e.g. for a full derivation of ADMM ...)" the sentence seems to be unfinished. - line 339: "proximal operator these" - line 373: "problem is concave is the single variable" =====