We thank the reviewers for the positive and detailed feedback. Let us briefly address the main issues raised in the reviews.$
Assigned_Reviewer_2: We assume that adversary choosing the cost functions can make the choices based on the draws of side-information, but does not see the randomization of the player (oblivious to the player). We will clarify this in the revision. It is possible to directly deal with the adaptive adversary using the relaxation framework, but the analysis is more involved (specifically, condition in Eq. 6 becomes more involved).
Assigned_Reviewer_3:
We will add some intuition/explanation of the algorithm and a very brief proof sketch, but there is not enough space to give all the background. Instead, we will add specific pointers to the existing literature, so that the interested reader can get a better grasp of these new techniques.
Regarding the issue of internal randomization: there are two places to check. First, the initial condition (6) holds for all q_t sequences, including those strategies that depend on the past randomizations. Second place is the recursive condition (5). Note that q_t minimizes a term that does depend on the random choices y-hat through the information I_{1:t-1}. Hence, the choices q_t definitely depend on internal randomization.
Specifics:
“Certifies the inequalities (5) and (6)” -- we mean that q_t may not be the exact minimizer yet gives the inequality for that q_t. In this case we call it admissible.
Regarding accumulation of errors from approximate ERM. The estimates we build are unbiased no matter what q we choose. This unbiasedness is the only property required for going forward, and the only loss from this approximation is on the present round. Hence, the errors do not accumulate. We will add a note about this subtle point -- thanks!
The reviewer is correct that Lemma 1 only addresses an oblivious adversary. Eq (6) for the non-oblivious case is more involved, and we omit it for the sake of clarity of presentation.
Assigned_Reviewer_4:
The reviewer is correct that our methods requires d*n oracle calls. We do not know at the moment whether one can improve it to O(sqrt{n}) to match the oracle complexity of Agarwal et al. While in the i.i.d. setting this is possible, the hybrid scenario we are considering is more demanding.