We thank the reviewers for their time and comments. We address specific comments below.$
Assigned_Reviewer_3
We thank you for all the comments: they will be fixed in the final version. Specifically, we will give the proof sketch of corollary 5.1 in more detail in the body of the paper.
Assigned_Reviewer_4
Thank you for the constructive review. We answer your main concerns and hope you consider changing your score:
1. We do not consider the theory level in the paper to fall from many previously published papers in ICML. In particular, confidence bounds analysis appears rather extensively in the literature, and we find our extension to worst case distribution confidence bounds (using regret bound analysis) quite interesting and not straightforward. Additionally, and as we state in the paper, the main contribution of the paper is conceptual and not technical – it is one of the first attempts to marry online learning and statistics. This seems to us important in the context of quantifying uncertainty in many real world scenarios.
2. We believe that the statements you cited are well-supported in the paper: the generality with respect to ML-based methods follows from the fact that we allow *arbitrary* distribution of the noise terms, while ML-based approach requires the distribution to be fixed and known in advance (in most cases, Gaussian distribution). The statement on “more complex scenarios and models” is backed up in our preliminary experimental results, in which uniform and discrete errors are used to generate data. Overall, we find this statement well justified in the paper but agree to soften it if the reviewers think it is not.
3. We think that you are right that the practical value of this work should be judged on real data. However, the quality of the variance estimation can be verified only with synthetic data (for which the true variance is known). On real data only the quality of the signal prediction can be verified, which is not the point of the paper. Overall, we feel that a more comprehensive study (including real data) should be done in order to judge our approach, but is outside the scope of an ICML paper.
Thank you for all of your minor comments – we agree with most of them and will fix them in the final version.
Assigned_Reviewer_5
Thank you for the careful reading of the paper. We try to alleviate your concerns about the correctness of our proofs, and hope you change your score.
The point you mentioned is indeed very delicate, but there is no error. The sequence of losses \ell_1,…\ell_T is in fact not stochastic, but rather chosen by an adversary as is common in the OCO literature. Note that even if the adversary chooses to generate \ell_1,…\ell_T in a stochastic manner, a^* is still defined as the optimum over this specific realization of loss functions, and thus it is clear that a^* is not a random variable.
The only source of randomness is the biased gradients we receive as feedback, which are controlled by the adversary through \tilde{\ell}_1 , ... , \tilde{\ell}_1. The expectation is taken with respect to this randomness, and thus the computation in lines 1027-1032 is correct for any a and in particular for a^*. Similar arguments can be found in the literature (for example, in http://ocobook.cs.princeton.edu/OCObook.pdf , page 107).
In the context of variance estimation, you can see that the explanation above holds: each loss function depends on the variance chosen by the adversary (which is deterministic). The feedback however depends on the specific realization of the error term and on our prediction of the signal. The latter is responsible for the bias, while the former is the source of randomness in our setting.