Thank you for your very helpful feedback and for taking the time to read our paper. $ Reviewers 1 & 2: We agree with you that significant space was devoted to build up to the main contribution (MAGIC), and that this can be a lot for the reader to wade through. As you suggest, we will shorten the beginning of the paper (e.g. by placing less emphasis on the consistency of WDR and COPE) to allow for further discussion of COPE and particularly MAGIC. Reviewer 1: “Where is the model (state-action transition) in equation 2?” The model is used in the \hat q and \hat v terms, which are defined from lines 165 - 187. For example, \hat v^{\pi_e} is the value function for \pi_e on the model (not the value function for \pi_e on the true MDP). Reviewer 2: Thank you for the feedback and references - we were not aware of this additional connection. We completely agree that a complete finite sample analysis of MAGIC would be both valuable and difficult. We will expand our discussion of finite sample performance in the paper but we suspect that a formal analysis will remain beyond the scope of this paper. Reviewer 3: Thank you for the corrections, suggestions, and additional reference. We are particularly interested in plotting the bias and variance of different length off-policy $j$-step returns in our experiments - this could be very informative.