We thank the reviewers for their time, and also for their feedback, which has helped us improve our paper.$
Both reviewers 1 and 2 expressed a desire to see more experiments that validate both the relationship between delay and bias.  To address this, we will add an additional experiment, similar to Figure 4, that measures bias as a function of delay; this experiment will show that our bias results from Section 4 match practical outcomes.

Reviewer 1:

We thank R1 for their time and effort in providing particularly detailed feedback.

We agree with R1 that the idea of sparse estimation time, and an O(n) rate for synchronous Gibbs presents an interesting contribution in itself, and a shift in thinking about convergence of Gibbs samplers.  We will add text to the introduction and to Section 4 to highlight this, and to explain that sparse variation distance behaves in this way because it uses a more local view of the chain than traditional metrics.

In Section 5, R1 observed that the presentation, and use of $\bar \pi$ to mean the biased distribution, is confusing.  R1 also correctly noted that, for a stationary distribution $\bar \pi$ to exist and for our results to make sense, the delay conditions on the machine must be stationary.  We agree with R1 that this is a practically unrealistic condition.  It turns out that the need for these conditions is an unfortunate artifact of the way we presented the results, and not fundamental to the theory.  To address this, we will modify our presentation to apply to the case where no stationary distribution necessarily exists (by replacing the question “how close are we to the stationary distribution?” with “how much do our samples depend on initial conditions?”), thereby removing both the confusing references to $\bar \pi$ and the assumption of stationarity of delays.

R1 states that Lemma 5 is interesting in its own right, and gives a better insight into the "accuracy barrier" than Theorem 1.  Upon reflection, we agree with this.  We will present Lemma 5 in place of Theorem 1 in the body of the paper, and will explain how it illustrates the bias term. We will leave in Claim 1 as the only result that talks directly about sparse estimation time, and move Theorem 1 to the appendix.

R1 notes that the value of Section 5 is initially unclear.  To address this, we will add text to the beginning of Section 5 to more quickly motivate what we are doing.  In particular, we will emphasise that the bias is often small in practice, and we still need the mixing time to know how long to run the algorithm.

R1 correctly notes that Lemma 2 is an equivalent way to define $\alpha$.  This definition is used in the appendix because it makes the proofs easier; the definition given in the body of the paper (Definition 4) is used because it is more intuitive and does not require introducing the concept of a coupling.  We will amend the appendix to make this clear.

R1 asked whether it is surprising that the the stationary distribution is affected by errors caused by reading stale data.  We included this statement because several people we spoke with while writing the paper, including practitioners who use asynchronous Gibbs in practice, expressed the opposite hypothesis.

R1 asked about multi-model Gibbs: this consists of multiple threads with a single execution of Gibbs sampling running independently in each thread.  We will add text to Section 6 to make this more clear.

Reviewer 2:

R2 notes correctly that there is a gap between the counterexample (degree O(N)) and the theoretical results (degree O(1)). Rather than being problematic, this gap makes us hopeful that future work will provide theoretical results for intermediate cases (such as logarithmic degree). For clarity, we will add prose explaining this gap to the paper.

R2 asked about threads in Figure 4: the results Figure 4 are from a simulation that uses the statistical model in Section 3.  It does not use threads, but rather simulates delays directly based on a maximum-entropy model given $\tau$.

R2 asks how changes in the number of threads would affect the delay parameter. The relationship between the number of threads and the delay parameter is a complicated function of the number of threads, the workload, and the underlying machine. While from empirical results it seems to be small ($\tau < 10$) for many practical problems, exploring it fully would require a systems analysis that is beyond the scope of this paper.  However, for clarity, we will try to explain the relationship by citing previous work on Hogwild SGD that has analyzed the $\tau$ parameter.

Reviewer 3:

R3 expressed interest in how the asynchronous update can be modified to gain a better mixing rate beyond the Dobrushin’s condition. It is our hope that future work on analyzing asynchronous Gibbs using spectral methods will yield more general algorithms and rates that go beyond Dobrushin’s condition.