We thank all reviewers for their helpful comments.$ Regarding experiments: The focus of our paper is a theoretical derivation and analysis of the algorithms. That being said, close variants were studied empirically in Shamir (2015), including k>1 and random initialization. We will add a clearer discussion of these empirical results. Also, we thank reviewer 1 for catching a typo -- we will fix accordingly.