Paper ID: 341
Title: SDCA without Duality, Regularization, and Individual Convexity
Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper extends Stochastic Dual Coordinate Ascent (SDCA) so that no explicit regularization or duality is needed. Even when individual loss is not convex, the paper still managed to establish linear convergence rates, so far as the expected loss is strongly convex.
Clarity - Justification: Clearly written
Significance - Justification: See below
Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Mathematically speaking, the results presented are interesting. My major concern is whether the method is significant for solving machine learning problems. Unfortunately, the paper didnâ€™t show any experimental result. It suggested deep learning when stressing that f_i can be non-convex. So it will be helpful to compare the empirical performance against existing methods of training deep nets. In Sec 2.2 where each f_i is convex but without regularization and the expectation is strongly convex, it will be useful to show some machine learning models that fall in this category.
=====
Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): In this paper author/s analyzed SDCA algorithm without duality. It was shown that this algorithm is basically an instance of SGD algorithm with variance reduce feature (e.g. idea similar to SVRG, SAGA, ...) however, it can be obtained much cheaply. It was also shown that this theory can be applied to a loss functions, which may not be convex (the only requirement it that the total objective function will be convex). The paper is concluded by a note about acceleration
Clarity - Justification: the paper was easy to read and follow. The paper is very short - just 4 pages. It would be a bit nicer if the paper would contain some numerical experiments and the proofs would be in supplementary materials.
Significance - Justification: I personally like the idea, that the Reduced variance SGD can be achieved also in this way (presented in the paper). In e.g. SVRG, I find it very hard to choose in practise the learning rate and size of inner-loop. It seems that this paper has overcome this problem, as the learning rate can be chosen easily (e.g. if we know the strong covexity parameter -which we know if we regularize the objective function). Also the learning rate is \sim 1/L which is as expected.
Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I like the paper. One think which I do not like is the section 2.3 when the accelerated version is discussed. From mine point of view, this section is somehow weak and not finished. If we consider the work of Accelerated SDCA (Shalev-Shwartz & Zhang, 2015), they have to solve the problem into certain error. However, the SDCA without duality proposed in this paper, can produced iterates (which are maybe close to optimum), which will NOT be feasible for a dual function. Hence it is not clearly to me to see how the Accelerated SDCA will work. I would hence suggest to maybe state some comment and details about it or to remove that comments.
=====
Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a variant of the SDCA optimization method which does not require duality or explicit regularization. In addition, the method is proven to work even with component loss functions which are non-convex. The method is analyzed theoretically to prove its linear convergence rate.
Clarity - Justification: The page budget could have been utilized much better to provide a stronger motivation for why the proposed methods are important.
Significance - Justification: The assumptions and features of the underlying the proposed method seem important in practical situations. Handling non-convex component losses seems a good step forward to me as several commonly used and popular models these days rely on non-convexity. In addition, controlling model complexity and enforcing the correct regularization seems to be a key challenge.
Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper does a good job of highlighting the importance of the proposed method and is written reasonably well to follow through. However, the motivation for why the new methods would work well in practice is not provided. The page budget could have been utilized much better with at least some proof of concept experiments on some synthetic datasets to illustrate that the methods do infact work. It is not clear how the proposed SDCA without regularization compares against the regularized versions in terms of the quality of solutions achieved. The ideas could be more strongly motivated so that the needs for these is clear to the readers.
=====