Dear Reviewers, thank you for investing the time and energy reviewing and for sharing your comments with us. $ This paper straddles two sub areas of CS, machine learning and data bases. Catering to both crowds made it more difficult to write. The reviews perfectly reflect that: * "More details of all the formulas/equations are expected" * "In a sense, the discussion on machine learning are sometimes too didactic" We are aware that. A data-base oriented researcher would find some parts rudimentary and others hard to follow. The same is probably true for the ML expert (about different parts of course). Given the strict page limit, we tried to strike a balance between the two communities. We believe a dedicated reader, from either field, should be able to follow the derivations. More specific responses to comments: About training time, the running time complexity of the algorithm is dominated by a simple aggregation procedure. This could be done by a single SQL query or one map-reduce job. Thus, the running time depends almost solely on the underlying data system (and not on the algorithm itself). Such a comparison, we feel, would be outside the scope of this paper. As a remark, this is also the reason why line 6 of the algorithm does the simplest possible operation. Accelerating it will not significantly change the runtime. About comparing with other methods. Other sampling methods were indeed experimented with. As detailed in the experimental section, those require significant tuning and data insights. This is something we struggled with. For the DBLP dataset, for example, stratified sampling based approaches ended up performing _worse_ than uniform sampling (including Joshi and Jermaine 2008). We eventually managed to find other heuristics that improved on uniform sampling but those diverged significantly from prior art. The presented experimental results are our best attempt and providing reproducible results. With respect to comparing to distributed/parallel querying mechanisms, this would be very interesting! Especially in the case where the limiting factor the query-response-time. In this paper, we focus on data-size being the bottleneck (which is often the case). In that setting, sampling is the industry standard and we show how to improve on that procedure. We agree with the reviewer that, if the entire data can be indexed on distributed servers, it would potentially be a preferred solution. Drawing the parallels with importance sampling is a very good idea. We did not think about that. We believe the reviewer is right to observe that out algorithm could be thought of as an extension of importance sampling. Optimizing for "worst reconstruction" or other cost functions (not squared loss) are possible. We comment about that in the concluding remarks. Nevertheless, we could not include those results in the space limit. We hope others will build on our framework and provide optimizations for other cost functions and settings.