We thank all the reviewers for their helpful remarks. We plan to address all the minor issues pointed out in the reviews, and take this opportunity to briefly address some of the more substantive critiques raised by the reviewers.$ Reviewer 1: We indeed show, as part of our simulations, that running an off-the-shelf no-regret algorithm can result in linear regret. However, we agree with the reviewer that a more direct explanation earlier on in the paper could prevent any potential confusion. The reviewer asked to compare our work to Amin et al. There are several differences, including: (1) They study the problem from the perspective of the seller, not the buyer (2) Their model has a single buyer. (3) They define regret for the buyer and seller with respect to different benchmarks. (4) They assume the buyer's valuation is subject to time discounting, and their regret bounds depend on the discount factor, with non-trivial regret only achieved when the discount rate is strictly less than 1. Regarding the reviewer's comment on the seller's reaction to our algorithm, we acknowledge that our work focuses on the problem of designing a no-regret response for the *buyer* side only and ignores the strategic implications of such algorithm on the seller side. We agree with the reviewer that studying the other side of the problem would be a very interesting future direction. ----------------------- Reviewer 2: We thank the reviewer for their thorough reading of the paper. With regard to their comment on estimating the rate of participation we refer the reviewer to the response to reviewer 3 where we give a detailed explanation of why this is indeed possible. ----------------------- Reviewer 3: We strongly disagree with the reviewer with regard to the model capturing a realistic setting. While the early advertising exchanges were simple mechanisms following the rules the reviewer described; the proliferation of exchanges and competition between them has led to more sophisticated strategies taken on by all parties. Specifically: Even though exchange A does not get a call every time the publisher sends an impression to exchange B, It is relatively easy for exchange A to know the approximate amount of traffic the seller sends to other exchanges. This can be done with either estimating the overall traffic the publisher receives, or by randomly monitoring the publisher’s webpage and observing the fraction of times the ads on the page are served by exchange A. While the exchange needs to pay the publisher (1 - revshare) fraction, this constraint holds in sum across all impressions. Specifically, the exchange can take on the arbitrage risk, by promising the publisher a minimum price, and recouping the cost later if needed. The exchange often controls the reserve price, which affects the amount of money the publisher receives. Finally, we do not address the equilibrium question here. The effects of what happens if B runs a smarter algorithm are beyond the scope of this work, but are an obvious avenue for future research. This is a valid criticism of the model, but the reason we feel our model is nevertheless justified is that in general, exchange B is a representative of all options other than exchange A, and if there is a number of such options, the response of the individual exchange will not have a significant impact on the aggregate distribution of this outside option. (This is essentially the same reasoning used to justify mean-field equilibria). The reviewer is right that some publishers might send an impression serially to different ad exchanges, but as they mention, this results in poor user experience and for that reason we choose to not focus on modeling such low-quality publishers. While we agree with the reviewer that --- as is the case for almost any theoretical work --- our model does not fully capture the real world, we emphasize that even our simplified setting leads to non-trivial technical questions. We consider our work as a first step --- and certainly not the last word --- towards designing best-response algorithms when facing a no-regret agent. ---- Overall, we highlight that our work implies that an agent competing against others in a world where the decision maker uses a low regret learning algorithm can learn the value of the best action in a low regret manner. We find this to be an interesting setting in and of itself.