Many thanks for the comments!$
--Assigned_Reviewer_2
Q1: I'm not sure that I see how the considered setting is relevant to applications (e.g., in online advertisement and online recommendation). My concern is that the parameter vector w_star is assumed to be fixed and deterministic throughout and only the binary feedback is stochastic
A1: Our setting can be applied to online advertisement/recommendation for a fixed user, e.g., recommending items for a registered user in Amazon. In this case, w_star models the unknown interest of the fixed user, and x_t represents an item recommended to this user in the t-th round. We will study the scenario that w_star is allowed to change as a future work.
Q2: Also, I wonder if the authors can comment on exactly what properties of the logistic model enable their solution …
A2: The analysis depends on two properties of the logistic model: (i) the derivative of the link function has a lower bound, which is used in Lemma 2, and (ii) the link function is upper bounded, which is used in (17). Our analysis can be applied to any GLM that satisfies these two conditions. This paper only studies the logit model because we are focusing on one-bit feedbacks. We will add more discussions in the final version.
Q3: You say that the problem is a "bandit setting" of an "online linear optimization" problem, but … Is it minimizing the linear regret defined in Eq (2) using feedback generated by the logit model in Eq (3)?
A3: Exactly. Our algorithm is for “minimizing the linear regret defined in Eq (2) using feedback generated by the logit model in Eq (3)”. Furthermore, according to Lemma 1, it also minimizes the nonlinear regret in Eq (4). We will revise the paper to make it more clear.
Q4: Please state in Thm 3 the resulting regret bound of Algo 1, in terms of the natural parameters of the problem (T, d, R). The regret bound has an exponential dependence on the diameter R …
A4: Thanks for the suggestion. We will state the dependence on T, d, and R explicitly.
Q5: In SLB, the feedback signal (= the incurred loss) might as well be a binary/integer random variable, as long as its expectation is "right".
A5: If the feedback signal in SLB is allowed to be binary, it actually makes a VERY strong assumption about the noise. Note that Eq. (1) of GLB is equivalent to y_t=x_t^\top w_* + e_t, where e_t is a zero-mean random noise. If the feedback signal in GLB is binary (e.g., 1 and -1), it means e_t is either 1 - x_t^\top w_* or -1 - x_t^\top w_*, and its distribution is also determined due to the zero-mean requirement. In other words, the noise in GLB needs to be binary and depends on x_t^\top w_*, which is unusual and lacks justifications. So, we argue it is more natural to use the logit model to handle binary feedback.
--Assigned_Reviewer_4
Q6: 1. In Eq. (12), where does the \beta/2 come from?
A6: We introduce \beta/2 because the proof of Theorem 3 (Line 1559 of the supplementary) requires $\gamma_T \geq R^2 \beta/2$. We will remove \beta/2 from Theorem 1 and introduce it in Theorem 3.
Q7: In Figure 3, it would be more interesting if x-axis is the actual running time …
A7: Thanks for the suggestion. We will revise the experiments accordingly.
--Assigned_Reviewer_5
Q8: Related works. The structure of Section 2 is a bit weird …
A8: Thanks for these suggestions. We will reorganize Section 2 accordingly.
Q9: Missing References. … I believe there are other ways to perform online logistic regression, like Stochastic Gradient Descent (see e.g. [Bach, Adaptivity …, JMLR 2014]) … I believe that in the paper by Chapelle and Li on the use of Thompson Sampling …
A9: We agree that other algorithms can also be used to perform online logistic regression. We choose online Newton step because it allows us to construct a tight confidence region. We will add discussions about using “SGD of Bach” and “Thompson Sampling of Chapelle and Li” to improve the efficiency.
Q10: The theorems. In the proof of Theorem 1, rather that the gradient of \bar{f_t} …
A10: Thanks for these suggestions. We will improve the writing accordingly.
Q11: The experimental section is not very satisfying … which parameters are used in GLB (theoretical parameters or manually tuned?) …
A11: Thanks for these suggestions. We will revise the experimental section accordingly. The parameter of GLB is manually tuned. We will compare with Thompson Sampling and heuristic GLB with online logistic regression.
Q12: This algorithm does not appear to handle non-static context set (D_t in place of D) … can this be improved?
A12: Similar to OFUL, our algorithm can handle the non-static context, that is, we can use D_t instead of D. It is easy to verify this claim by checking the proof of Theorem 3 in the supplementary, which follows the analysis of OFUL. We will elaborate this point in the final version.