Many thanks to all the reviewers for their comments. We will edit the paper to clarify these points, but want to address a few key points in the rebuttal.$ First, Reviewers 2 and 3 had some questions about the overall accuracy of the approach, namely whether we were fairly comparing accuracy levels against other approaches (Reviewer 2 also raises a question about "inner loop accuracy", which we address below). This is a key point, and we apologize that the presented results are not clear on this, because we completely agree with the sentiment; if the accuracy levels are not comparable, then timing results don't have much value! So to be explicit: in all reported timing results that don't explicitly list objective value (and specifically in the scaling examples), Epsilon is achieving a _lower_ primal objective value than SCS in less time (comparison to ECOS is a bit trickier, since interior point methods achieve very high accuracy but do not have useful intermediate results, so the point still holds). To illustrate this better, we propose to include mirror examples for Figures 2 and 3 (the lasso and support vector data description examples) that show accuracy versus pure runtime for a single problem size, available (anonymously) here: http://dropproxy.com/f/D9F http://dropproxy.com/f/DA0 The slightly higher objectives in Table 2 for MNIST (which is using a multi-class SVM loss, also to be clarified) and sparse inverse covariance are due to slight differences in stopping criteria for two approaches, which we will also describe more fully. In the revision we will tighten these conditions for Epsilon, to show equal objective value still in significantly less run time (2.53s for MNIST and 2.09s for sparse inverse covariance). Second, for Reviewer 2 there appears to be several very large misunderstandings here, so we want to very explicitly make some points. The first key point, which is extremely important to the rest of the paper: unlike the reviewer states, we _cannot_ easily solve the optimization (5) using ADMM. This is because the original DCP functions f_0,...,f_n may _not_ have efficient proximal operators or projections (they could be very complex compositions themselves), so the ADMM minimization step over these functions is as hard as solving a general convex problem. The entire point of our transformations, which is stated explicitly in the theorem, is that the \bar{f}_i functions that we transform to are either _atomic_ functions or an epigraph constraint of an atomic function, which _do_ have these efficient operations. Indeed, this is the entire premise of disciplined convex programming, that we can express complex convex problems in terms of compositions of a relatively small set of atomic functions (see our background section on DCP and references therein for additional explanation). So the proposed "pure" ADMM algorithm really is not at all applicable for general problems here. That being said, we do think that a few explicit examples of this, such as showing how the Robust SVM in Section 5 translates to the DCP form, would be helpful at illustrating this point much better, and we can include this. Also (and we agree this point needs to be expanded upon, it is mentioned now only briefly as an "optimization" in the proof), we note that in many "simple" cases, for example the Lasso problem, Epsilon will exactly correspond to the "typical" ADMM implementation (i.e., alternating least squares and soft thresholding), as this is precisely the optimized form that results from applying the rules. The innovation here is that this can apply to any DCP problem (given efficient operators for atomic functions). With regards to the "inner loop accuracy" of the methods, SCS uses an exact linear solver and exact cone projections, and ECOS uses an exact linear solver in an interior point method. Adjusting inner loop accuracy does not offer much benefit in these systems, and in general they are very mature solvers that have undergone substantial optimization, so we believe they offer a very strong baseline comparison here. Finally, to Reviewer 1, your point is well-taken regarding being more explicit about the class of allowable functions here. You're right that the implicit dual Newton requires smoothness and the sum-of-max requires piecewise linearity of the optimality condition. But these are just examples of two (albeit very common) classes of epigraph projections that we do solve using these approaches. Other classes, such as the l2 norm, are also handled by Epsilon; indeed, in the worst case we can solve the epigraph projection using a proximal operator and bisection. So in this case solving "any" DCP method really means we can solve any DCP consisting of atoms that have efficient proximal operators. We'll edit the paper to clarify this point, and also relate to objectives such as Huber loss (which actually does have an efficient epigraph projection as well).