We would like to thank the reviewers for their valuable comments.$
Reviewer 1:
- (Strongly) adaptive regret: Indeed, the results we present provide bounds against changing predictor sequences in a contiguous time interval. However, setting the baseline to a fixed predictor for all time steps, we get adaptive regret bounds. Nevertheless, one can consider regret in time intervals against changing sequences, which is technically a stronger result than bounding the regret only for the whole time horizon. In practice, it requires essentially the same effort to obtain both results, although sharp interval regrets require slightly more careful adaptation of the parameters. In fact, adaptive regret and tracking/shifting regret are very much related notions, and all methods we are aware of that were developed to minimize adaptive regret are variants of tracking algorithms known earlier (see some discussion in Gyorgy et al 2012, which also applies to the new paper of Daniely et al 2015).
- We indeed compare performance to arbitrary predictor sequences, but the bounds only become meaningful if these predictors change slowly, i.e., they are simple enough relative to the complexity measures (e.g., D_V, D_{TV}^+) we introduce.
- In the paragraph following Theorem 7 we mean that the bound is not stated for some specific (optimized) settings of \eta.
Reviewer 1 & 3: We agree that the full generality of TDM is not utilized in the paper, and will simplify the form of the algorithm in the final version if the paper gets accepted.
Reviewer 3:
We contest most of the claims of the reviewer, concerning the matrix setup and the main result of the paper:
- First, we would like to point out that the loss matrices are not diagonal: according to the definition, the squared loss matrices are diagonal, and the problem is indeed a matrix (and not a vector) problem. The setup is taken from Hazan et al (2012), who showed that this class of matrix-prediction problems is applicable to describe a large number of applications, including online versions of collaborative filtering, gambling, and max-cut.
- The main result of the paper is not the derivation of the matrix algorithm but the unified view of tracking/shifting algorithms. Previously in the literature two types of algorithms/bounds where derived for the problem: online gradient descent, which automatically admits shifting bounds, and exponential weights, for which fixed-share type algorithms have been derived. In this paper we establish the link between the two approaches by showing that both can be viewed as a regularized mirror descent algorithm, and our work shows how to obtain shifting bounds for any version of the mirror descent algorithm. Note that previously these results were derived individually, repeating the same steps in the derivation. Our current formulation allows very simple derivation of shifting bounds/algorithms for any of these problems in a disciplined way. Admittedly, the matrix results are not surprising, and they serve the purpose of demonstrating the effectiveness of the approach in a setup that has not been worked out yet. Note that due to our generic results, the whole proof for the matrix case is about a page (which also includes some proofs that were included only for completeness), about the same length as a standard proof without tracking.
- c_\phi should indeed be equal to \tau, thanks for pointing it out.
- Theorem 7 vs l. 508: In Example 6, all u_t are in the simplex with unit L1 norm, and the term T log(1/(1-alpha)) is the counterpart of the sum with the norms \|U_t\.
Reviewer 3 & 4:
- l. 273: w' should be v
Reviewer 4:
- l. 399. w should be u.
- l. 719: yes, and it will be an equality with the choice c_\phi=\tau.