REVIEWER 5: $
1- Theoretically, Section E.1 and Propositions 3 &amp; 4 show that the performance of the empirical based estimator depends on the shape of the underlying distribution P. Indeed, the closer P is to a uniform distribution, the worse the performance is. Similarly, the performance of other estimators (maximum likelihood estimator, projected estimator, etc.) also depends on P. Therefore, the choice of the optimal estimation scheme depends on the shape of the distribution. Intuitively, estimators unavoidably include an implicit prior over their outputs.

2- As an example use case, consider Chrome's use of O-RAPPOR to track home page settings.  The application is built and deployed without knowledge of the candidate set.  At analysis time, an auxiliary source of server-side data is used to generate a candidate set, e.g. based on popular URLs (available e.g. via Google.com usage statistics) and malicious URLs (available e.g. via explicit abuse reports from users or from malware tracking labs). This candidate set need not be precise, and it can be continually updated without updating the clients.

3- We chose a moderate S due to space and time limitations.  The general outcomes are similar. We will include the results for S = 4096 in the final version. 

We will fix the typos. 

REVIEWER 6: 

1- Erlingsson'14 introduced RAPPOR but didn't study its performance theoretically, prove its optimality in the high privacy regime or its non-optimality in the low privacy regime, or compare it to other schemes (such as RR). Kairouz'14 introduced RR for closed, k-ary alphabets in a hypothesis testing context. Our contributions are: 1- we apply RR to distribution estimation, 2- we show that it's the optimal mechanism for binary alphabets (for all loss functions and all privacy levels), 3- we extend RR to open alphabets and propose an empirical-based decoder for it, 4- we show that for k-ary alphabets, RR is order-optimal in the low privacy regime and RAPPOR is optimal in the high privacy regime (this also shows that the dependency on the alphabet size (Proposition 1) vanishes in the low privacy regime), 5- we show that for open alphabets, RR beats RAPPOR in both regimes. Please refer to the "Our contributions" subsection for more details. 

2- O-RR first projects an S-ary alphabet onto a k-ary alphabet via a randomly constructed hash function. O-RR then applies k-RR to produce a privatized, hashed output. Under O-RR, the loss is due to hashing and privatization. When the number of users is large, the losses due to hashing can be overcome by using more cohorts (more hash functions). When epsilon is small (high privacy regime), local differential privacy comes at a cost proportional to the size of the alphabet (see Proposition 1). Therefore, if we are to simply use the plain vanilla RR on the original alphabet (of size S), the loss (due to differential privacy) will be incredible in this high privacy regime. However, O-RR has the capability of projecting the S-ary alphabet onto a much smaller alphabet (of size k) and then using cohorts to get rid of the losses due to hashing. This explains why O-RR has a performance that beats k-RR (and RAPPOR) in all privacy regimes.

k-RAPPOR introduces noise in each coordinate independently, and the empirical estimator estimates each coordinate independently.  As a result, the performance of k-RAPPOR enjoys more independence from the choice of k, but as a result, O-RAPPOR also cannot derive much advantage from using cohorts to reduce k. We will clarify this in the final version.

3- The low privacy regime offers users plausible deniability (as opposed to strong privacy guarantees). For relatively large values of k, k-RR outputs the original data value with probability 1/2. There's always a 50% chance that what the service provider collects from a user does not match the original data. In many applications, data is collected to enable the making of a specific decision. In such settings, the nature of the decision frequently determines the required level of utility, and the number of reports to be collected n is predetermined by the size of the existing user base. Thus, the differential privacy practitioner’s role is often to offer users as much privacy as possible while still extracting sufficient utility at the given n.

REVIEWER 7: 

1, 4-7- Thanks. We will take care of these comments in the final version.

2- The results in 4.3 hold for both losses. Unfortunately, we couldn't find a clean analog to Proposition 5 for the l_1 loss. However, from propositions 3 &amp; 4, the l_1 loss has a similar behavior to l_2^2 loss. We will incorporate your comments in the final version.

3- Proposition 1 refers to the best possible risk and holds irrespective of Q. We will take out Q from the statement of the proposition.