We thank the reviewers for their considered feedback. We first respond to a common concern:$ > Improved presentation of results Our core results seem appreciated by all reviewers. We will however aim for a more direct presentation of the motivation and value of these results. Specifically, we can: - expand Sections 1 and 2 to make the overall picture clearer; this should make the flow of the subsequent sections logical (per R3) - tighten inline discussion (per R2), and - make more explicit the aim and conclusions of the experiments (per R1, R3) We also intend to emphasise the Bregman identity in Lemma 2 as a contribution of wider interest, as we think its generalisations might have interesting applications. We now respond to some specific points as to the overall picture. R1 > what more beyond link between DRE and CPE? The reviewer appears comfortable in Sections 3--5, which establish the general connection between DRE and CPE. Our aim with Sections 6--7 is to address the question of what one can actually do with this connection. Specifically: - for DRE tasks, Section 6 aims to give some design guidance as to choosing amongst the infinitude of feasible CPE losses. This is done by explicating the weight function over density ratios, its non-obvious relationship to the familiar one over cost ratios (Equation 19), and then illustrating how one can design new (convex) losses based on a given weight over density ratios (Section 6.2). We feel these issues are a natural follow-up to Proposition 6. - for a task where "conventional" CPE losses (e.g. logistic, exponential) are employed - bipartite ranking - Section 7 illustrates how the LSIF loss, traditionally thought of as being for DRE, can be especially useful owing to its weight function placing more emphasis on large probability values. This analysis is facilitated by our establishing in Lemma 1 that LSIF can be viewed as performing CPE, and thus we feel is relevant to the paper. As per our first response above, we will look to make these points clearer. > point of covariate shift experiments? The results in 8.2 illustrate that the analysis of Section 5 is borne in practice. Specifically, we show that in addition to the well studied KLIEP and LSIF losses, one can also use more "conventional" CPE losses such as logistic and exponential. We will clarify that we certainly consider KLIEP and LSIF to be viable, but simply meant to convey that they are "unconventional" as CPE losses. > point of ranking experiments? The results in 8.3 illustrate that the analysis of Section 7 is borne in practice. Specifically, we show the consistent superiority of LSIF over logistic regression (and competitive performance with p-classification) for "ranking the best" tasks. This fact is of interest since LSIF has a closed form solution (Equation 12). We thus view the results as motivating usage of LSIF as a competitive method in such problems. > need for 2.2 and 5.1? Section 2.2 is simply meant to serve as background on relevant theory of CPE losses; we included it since we didn't think all of it was especially standard to the wider ML community. Further, we believe all concepts introduced are used later in the paper, but would be happy to suppress anything specific the reviewer identified as unnecessary. Section 5.1 is, we strongly believe, novel, and indeed a central contribution of the paper. As noted, we are unaware of Lemma 2 appearing in previous work. We are not sure which part of the proof is repeated, but would be happy to acknowledge any relevant work the reviewer may have in mind. (In the event that the reviewer is referring to 6.1, the aim of this section was discussed in the earlier response. The proof of Lemma 11 is admittedly simple, but the form of the weight in Equation 19 is we believe non-obvious.) > Limitations of importance weighting We completely agree that noting the limitations of importance weighting for covariate shift correction would be valuable, and help contextualise the work. We are happy to add a discussion, with the mentioned citations, on this point. R2 > factor of 1/2 in Proposition 6? The reviewer is indeed correct; this is a typo, and will be corrected. R3 > more precise discussion in Introduction We'd certainly be willing to clarify any vague sentences in this section; any specific instances the reviewer has identified would be appreciated. > value of new application of DRE losses? See comment to R1 about the point of the results in 8.3, the analysis of Section 7, and why the latter relies on the link established in Lemma 1 of the paper. > open questions? We certainly agree that the present paper is not the final word on the matter. To correctly contextualise the work, in addition to response to R1 regarding limitations of importance weighting, we can look to expand the Conclusion to discuss issues that would be important for future study.