Reviewer 7$
To clarify, subtree queries are performed using the current state of the sampler, not independently.

At any given time, the current tree satisfies all triplets supplied by the user. When a new triplet is added,
the current tree is restructured to accommodate the triplet, effectively starting a new Markov Chain. 
Aperiodicity holds in the presence of triplets regardless
of when or how the triplets are obtained -- we show that the state space of the Markov chain remains strongly connected when limited to moves
that respect triplets.

In terms of the variance, the idea is to estimate it at the current posterior. Thus we sample a fixed number of trees from the posterior and then compute
the variance with respect to these.

We greatly appreciate this feedback and will do our best to make these aspects of our paper more clear in the writing.

In the experiments, the metric of quality used is the fraction of violated triplets (if we were to enumerate all possible triplets that the user might
supply). This is a user-centric quality metric, unlike say log-likelihood. It would be interesting to investigate other such user-centric metrics.

Reviewer 8

The comments about improving the experimental evaluation are much appreciated. We thank the reviewer for the suggestion of comparing 
the results we obtained to the results that would be obtained using user-feedback alone (i.e. ignoring geometry). This will be a great addition
to the paper. 

The question of which types of queries are reasonable to ask a user is crucial to the success of interactive machine learning, and in our view is a 
big open area. Many of our choices have been informed by considerations of this type. For instance, although triplet constraints are common,
the usual practice is to obtain these using triplet queries, in which the user is supplied with three items a,b,c, and has to specify which one is the
"odd one out". We expect that such queries will be very hard to answer in general (for instance --- which is the odd one out for "bear, squirrel, donkey"?).
Thus we supply the user with an entire subtree of 10 items, and the user need only identify one incorrect triplet, which should be much easier.

Reviewer 9

Indeed, there is a lot more theory to be worked out for this problem and it is quite nontrivial. The questions asked by the reviewer are very 
important ones, and would each require substantial mathematics.

To give an example, the reviewer asks whether the Markov chain is rapidly mixing. Now, the chain we are dealing with is based on three
things: the tree, the data points, and the constraints. Let's move to the very simplest case, where there is no data and no constraints, and
we are merely trying to sample a tree at random. Even for this seemingly elementary situation, it has been challenging to analyze the 
mixing time of a Markov chain that is even simpler than ours (one that only moves leaves around rather than entire subtrees). In a seminal
paper, Aldous showed an n^3 bound on the mixing time; a decade later, this was improved to n^2 by Schweinsberg. What happens when
data is added to the mix? We intend to find out, but it won't be easy!