First of all, we thank the reviewers for their valuable comments.$ -To R1 *(1)&(2): As R1 pointed out, Th4 & Cor5 are obtained by variations and combinations of previous results. We see them as building blocks to introduce our approach. We do not hide it and we already cite important works: Lacasse et al 06, Catoni 07, McAllester-Keshet 11. In fact, the result of Germain et al 15 comes from Lacasse 06. We will add the reference, and write that Maurer 04 already provided a PAC-Bayes bound for arbitrary loss function. *About the paper’s idea Germain et al 13 adapted the works of Ben-David et al 06/Mansour et al 09 to PAC-Bayes theory by proposing a domains’ divergence expressed as an expectation of the H-divergence/discrepancy. Hence, the proof of their bound is based on the triangle inequality. Conversely, our proof is based on an equality for separating the risk into two entities and the introduction of a multiplying term (thanks to the Hölder inequality) leading to a different domains’ divergence measure (related to weight ratio). This provides a new study of DA through PAC-Bayes for which there exist few works. We claim that this result brings new insights to explain when DA can be useful. (see also answer to R6) *About Fig2 Since DALC & PBDA are based on the same parametrization of the linear classifiers, it is likely for them to exhibit a similar behavior on a small 2D dataset. Nevertheless, we can see that DALC enforces larger target margins. -To R1&R4 *About the experiments Our goal is not to significantly improve the results of Germain et al 13 but rather to propose a better theoretical framework. However, since the 12 DA tasks are closely related, the average performance across these tasks indicates that DALC is slightly better. Indeed, if we evaluate how much the test risk of both methods have the same mean thanks to a Wilcoxon signed rank test with a 5% significance level, we obtain a probability of 89.5% that DALC is better than PBDA. Thus, even if the empirical results are not significantly better than previous works, they confirm that our theoretical study is sound, in addition to being more interpretable than the previous PAC-Bayes one. We will add and discuss the result of the above test in case of acceptance. -To R4 *About the weight ratio Since our method does not learn a new representation, it is expected for a DA bound to be less informative in presence of regions of large target/source weight ratio. This is actually coherent with the DA results of Ben-David et al 12. This does not mean that DA is impossible, but suggests that DA may be difficult without additional (sometimes strong) assumptions. While this aspect was probably hidden inside the additional term of Germain et al 13, our result states it explicitly. In this situation, one may pre-process the data aiming for a good representation that reduces the weight ratio. *We would like to specify that contrary to other frameworks (such as Rademacher or VC analyses), there are only few studies of DA through PAC-Bayes. We think that our analysis is more interpretable than Germain et al 13’s results, and leads to interesting perspectives for DA (especially when the models take the form of a combination/aggregation). -To R6 R6 is right: *many (unsupervised) DA methods have proposed a way to estimate the divergence from unlabeled data, *our new divergence is hypothesis-free, but non-estimable from unlabeled data (without additional assumptions). However, we disagree with the novelty being mainly the introduction of this domains’ divergence and the fact that our method cannot deal with unsupervised DA. While other approaches study unsupervised DA using a divergence measure estimated from unlabeled data, we adopt another strategy: The unlabeled data is exploited through the target disagreement d_T(\rho). Then, we define a tradeoff between the target disagreement and the source joint error. From there, our divergence can be seen as a particular parameter for controlling this tradeoff. The experiments confirm that one can empirically tune this parameter to adapt in an unsupervised context. As far as we know, considering a weighted combination between the target disagreement and the source joint error is totally new and also very specific to our framework. Finally, note that we focused on unsupervised DA to compare ourselves to Germain et al 13, one of the rare closely related references. However, it is worth noting that our framework is more general and can be easily specialized to usual DA assumptions (eg covariate-shift where all terms of the bound can be estimated) see Section 5.2. Additionally, our study allows to integrate some target label data without modification. We think this is a desirable feature since some recent works suggest that incorporating target labels is an important direction for DA (Berlind-Urner ICML15, Cortes-Mohri-Medina, JMLR 2016).