Paper ID: 1154
Title: Pricing a Low-regret Seller

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper considers the following problem. There is a seller who at each time step t has a good that we (the buyer) value at some value v.  At each time step t we can propose a price p_t \in [0,1] to offer to the seller, who will either sell to us or sell to someone else.  We further assume that (a) we can view the seller's other option as an iid random draw from some distribtion D (independently drawn each time step), and (b) the seller is running a low-regret bandit algorithm to determine who to sell to.  The high level idea is that if can set prices p_t just slightly higher than the mean \mu of D, then the seller will adapt to buy from us.  So, if v > \mu, then we want to run a "noisy binary search" algorithm to learn \mu.  The binary search is noisy because the seller doesn't automatically sell to us when p_t > \mu and not sell when p_t < \mu but rather is running a low-regret algorithm.  The main result is that one can perform such a noisy binary search giving the buyer (us) low regret.

Clarity - Justification: The paper is generally well-written and easy to follow. 

Significance - Justification: While the analysis is not hard, the question of performing this kind of noisy binary search---against an opponent who has a regret budget wrt a linear loss---seems a natural one.  I haven't personally seen any paper look at this before and I assume the authors have checked.  

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I am giving the paper a weak accept because, while I wouldn't call this a "big" result, it gives a nice algorithm for a quite natural noisy-binary search setting arising from interacting with a low-regret algorithm.  This is a setting that apparently had not been considered before.  One thing I recommend the authors do is explain why the buyer cannot simply run a low-regret bandit algorithm himself.  After all, in a two-player zero-sum game, two low-regret algorithms playing against each other will approach a payoff of the minimax optimal value of the game.  The answer is that this game is not zero-sum, and in fact I believe one can construct a "bad" low-regret strategy for the buyer that performs very poorly in this setting (here, I am using "low regret" to mean the usual sense of low regret with respect to the history of play, rather than low-regret compared to knowing \mu).  I think it would be worth working this out.  Other comments:  - The related works section mentions a lot of papers, but I couldn't tell how they relate at a technical level.  E.g., whether the papers focus on the buyer or seller seems to me somewhat irrelevant; the more relevant thing is the model (and perhaps the results).  For instance, the Amin et al papers sound from the title somewhat similar - can you say more about the difference in the model and questions asked?  - If indeed the buyer is using an algorithm such as the one you propose, then the *seller* would be better off not using a low-regret algorithm but instead aiming to find the highest price it can demand from this buyer. I.e., the seller would be better off acting as the Stackelberg leader and demanding as close to v as it can.  Any comments? 

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.):   Online publishers (who offer slots for ads) want to sell adspace to  the highest bidding advertiser. To this end they use  algorithms for regret minimization.   This paper considers the interaction from the perspective of  the advertiser (or advertisement agency). How do I set the  price so that the publisher buys from me?   This paper considers the case that the other advertisers set  their prices stochastically, with unknown mean. The goal is  then to interact with the low regret algorithm to figure out  the mean price. After figuring the mean price, we can set ours  slightly higher so that the advertiser always picks us and we  profit.   The pricing algorithm is in essence a noisy binary search. In  addition, the price has to be set a little higher than the  competition to guarantee the seller chooses us. The amount of  subsidy necessary is a function of the exponent in the high  probability regret bound delivered by the learner.       

Clarity - Justification:   The problem is well-motivated, and the exposition is very clear.       

Significance - Justification:   This paper provides a fresh "flip side" perspective on bandit  learning, thinking about the origin of the rewards. In the  application considered this is not at all obvious, and itself  the result of strategic interaction.   The algorithm is a reasonably straightforward estimation  procedure (which is fine).       

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.):   This paper is intriguing because of its novel perspective. It  connects economic consideration with learning.   The answer, algorithm and analysis, are a first step. The  regret bounds are high, O(T^(3/4)), and can quite likely be  improved.   Line 036: "such system" -> systems  Line 040: "the reach". What is this?  Line 068: why is there not enough time, this can be parallelized?  Line 088: stray apostrophe  Line 115: "buying an item to" -> from  Line 129: typesetting of references looks odd. years?   Line 321, footnote 4: I do not see how we can estimate the  rate of participation in our own exchange. For I guess you  mean the fraction of times the seller turns to us instead of  some other exchange. But that was the question? Still, how do  we estimate the total number of impressions the seller has? If  this is a weak point of the paper (which it may be), then it  would be better to be clear about it. Or am I missing  something here?   Line 335: regret -> expected regret   Line 877: "she can try to Designing". Sentence got cut in the middle.         

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper studies the problem of how a buyer should set the price if he knows that the seller is running no-regret multi-armed bandit algorithm between the buyers. The paper is motivated by the digital advertising market where publisher is selling an ad impression by sending it to one of several ad exchanges.  The paper proposes a toy model. Seller is selling a stream of identical items to two buyers, A, and B. Buyer A is us.  Buyer B is our competitor; the assumption is that B's prices are chosen i.i.d. The seller sends item to one of the buyers, and then receives a price from the buyer. The assumption is that the seller is playing an multi-armed bandit algorithm with sublinear regret.   There are at least three big conceptual flaws with the model, ordered by severity.  1) The biggest problem with the model is that ad-exchange A does *NOT* know if an impression was sent to the competitor B. Exchange A only sees the impressions that were sent to A.  2) Ad exchange has contractual obligation to pay to the publisher e.g. 80% of price it receives from the real-time buyer. (The cut differs for different exchanges.) Ad exchange does not have the freedom send less money to the publisher. That is, it cannot charge its real-time buyer $100 and send only $5 to the publisher; it has to send between $100 and $80 dollars to the publisher. Ad Exchanges only freedom is to change the auction mechanism a little bit, e.g. by changing reserve prices (in a second-price auction), use first price auction, or perhaps artificially lower its cut so that the transaction happens.   3) If A is running this smart algorithm, why B wouldn't be running the same algorithm as well.  The paper ignores all of these issues.  

Clarity - Justification: The paper is clearly written.

Significance - Justification: The paper makes totally unrealistic assumptions about how ad exchanges operate.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The introduction has several other inaccuracies about ad exchanges. For example, it is common that a publisher sends an impression *serially* to several ad exchanges. First it sends an impression with a certain reserve price to ad exchange A. If the impression is not sold there, he sends it to another ad exchange B. Btw. This causes horrible user experience (long we page load times), but many crappy publishers, (especially mobile app publishers) do it because a) it's technically possible, b) the publisher doesn't care about the load time, c) it maximizes publisher's short term profit.

=====