We thank all the reviewers for the time and efforts on reviewing our paper!$
Below we address a few major concerns from the reviewers.


Q Reviewer_10: How do the authors comment on the original SVRG paper which states that "its results can be extended to locally convex functions".
A:
We believe this is possible under certain types of strong assumptions. For instance, if f is locally convex on the set X:={x : |x-x^*|<=delta} for some reasonably small delta, then one can add a barrier constraint by setting \psi(x) = infty if x is outside X and 0 otherwise. Then, the proximal SVRG guarantees to converge to local minimum if starting from a point in X. More generally, we believe most convex optimization algorithms do generalize in this way to locally convex settings. 

However, this setting is VERY different from our main focus of the paper. Indeed, for highly nonconvex problems such as neural nets, most of the training time is used BEFORE reaching a locally convex region if it exists. Therefore, we believe providing faster running time without local convexity is not only more challenging, but even more meaningful.


Q Reviewer_13: How is Eq (4.7) deduced from Eq. (4.6)
A:
Thanks a lot! Indeed the reviewer caught a typo in the high-level sketched proof of our submitted manuscript (but not in the full proof in the appendix anyways). When deducing Eq. (4.7) from Eq. (4.6), the expectation on the right hand side, i.e.,
E[f(x0)-f(xm)],
needs to be replaced with
E[ f(x0)-f(xm) - \eta/ (6 m0) * \blacknabla_{0:m-1}^2 ]
However, this additional term is negligible. If one substitutes the new (4.7) into Eq. (5.1), he/she will loose a tiny tiny constant because $\eta / (6m_0) << \eta$.
As a result, for instance, by replacing \eta/3 in Eq. (5.2) with \eta/4 will simply fix this typo.


Q Reviewer_13: In the experiments, did the measure of a "pass" include the additional computation of the full-data gradient by SVRG each pass?
A: 
Yes, we have truthfully included that when plotting the performance. We will make it clear in the revision. 


Q Reviewer_13: It is difficult to judge the significance as the paper makes a very heavy set of assumptions in the main body of the paper.
A: Notice that we have only made ONE assumption throughout the paper: the smoothness of each f_i(x). This is a minimal assumption. 
The rest (what we called simplification assumptions) are only for the sake of readability and explaining the proof techniques in a clean manner!