We thank the reviewers for their helpful comments.$ ===Assigned_Reviewer_6=== Thank you for your strong support, we truly appreciate your positive feedback. ===Assigned_Reviewer_7=== Thank you for your feedback. It seems there might be a mismatch between your interests/expertise and the topic of our paper. However, we did write for a broad audience and in fact both of the other two reviewers rated the paper as “Excellent (Easy to follow)” for clarity. Hopefully our responses below will help. >> “I did not catch the significance of the paper.” We reiterate our contributions here which will hopefully clarify the significance of our work- 1. We are the first to provide a rigorous theoretical analysis and explicit sample complexity bounds for structured parameter recovery from data aggregated into first order moments for both features as well as target variables. We show that under standard conditions, the true parameter can be recovered from aggregated empirical moments alone, with high probability. 2. We extend our analysis to cover approximation effects like noise in the collected data, or histogram aggregation, and show that parameter recovery is still possible within arbitrarily small degree of tolerance. 3. In the bigger picture, our work extends existing results in the compressed sensing literature by providing guarantees for exact and approximate parameter recovery for the case when the noise in the sensing matrix and measurement vector are linearly correlated, which may be of independent interest. Such scenarios have not been studied in the literature, and any existing results on related cases are limited to providing only approximate recovery guarantees, in contrast to our work which provides exact recovery guarantees. We emphasize that ours is the first such analysis for structured parameter recovery from aggregated data of ANY kind. In particular, our main results have not been shown in more than 60 years of ecological data analysis, dating at least to [Goodman, 1953], with parallel work in compressed sensing [Candes and Tao, 2006], and renewed interest in machine learning [Park and Ghosh, 2014; Bhowmik et al, 2015]. >> “What is real meaning of aggregated data in this paper?...” As explained numerous times and in multiple ways throughout the paper, "aggregate" means that summary information (such as mean, etc.) is provided instead of actual values, for at least some of the variables. It does not mean “data are not collected in one time”, which suggests a data streaming scenario. >> “...i do not see why semi­-supervision is an important issue and how does the proposed approach solve/avoid this issue.” As we note in the paper, our key motivation in studying this problem comes from the fact that aggregated data is ubiquitous in various applications and domains, including in such diverse fields as healthcare, sensor networks, sociological surveys, IoTs, etc. where data is reported in aggregated form in the interest of privacy, scalability, robustness to interference, etc. While increasingly prevalent, the kind of semi-supervision this provides has been little understood/exploited till now, and existing literature remains extremely limited, as we have elaborated in our section on related work. Our proposed approaches exploit structural properties of the data aggregation procedure, and provide the first guarantees for exact parameter recovery with high probability, even when the only known information about the data is a set of empirical first order moment estimates. As such, we expect our results to be a baseline for any future work on this topic, and hope that it spurs interest in the community on this important problem. ===Assigned_reviewer_8=== Thank you for your positive feedback, we greatly appreciate your supportive comments. >>”...It would be helpful if the authors can provide intuitions to clarity why their results can be trivially generalized to nonuniform cases.” Thanks for pointing this out. The extension to non-uniform group-sizes can be done in a number of ways by slightly modifying the proof technique. For example, suppose the aggregates in $i^{th}$ group had been computed from $n_i$ data points, where $i = 1, 2, … k$. Then, the quantity $kd exp(-C n)$ in the proof of Lemma 2.3 in our supplement gets replaced by $d\sum_i exp(-C n_i)$, where $C = \theta/(2kd sigma^2)$. This can then be used directly in the subsequent proofs, or approximated/upper bounded using standard techniques for better readability (for example, by using the minimum of n_i over i). Ultimately, our main result (that exact structured parameter recovery is possible from aggregated data with high probability) still holds, albeit with a slightly different sample complexity expression. We shall include a more elaborate explanation on this in the final version. >> “Same sentence appeared twice” Thanks for pointing this out, we shall correct it in the revision.