We thank the reviewers and all suggested corrections will be accounted for in the final version. We start with common threads in all the reviews.$ Focus of this paper is on scenarios where the strength of a relationship X->Y (causal or otherwise) is purely measured using the conditional distribution of p_{Y|X} instead of the joint distribution p_{X,Y}. We stress that the axiomatic study of dependence measures for joint distributions has a long and distinguished history (beginning with Shannon), whereas study of dependence measures on conditional distribution seems relatively unexplored. We are happy to replace the word “causal” by “conditional dependence” (as suggested by Reviewer 3) since it describes our work just as well, and avoids any confusion with causality identification/measurement. Relevance to Application: In our showcased application on gene regulatory pathways, the many purely causal relationships are already known from direct intervention experiments, but what is critical is to know exactly which causal relationships are active at which time during a specific intervention (drug release). Key contributions: There are two main contributions, the first one being conceptual and the second one a strong technical one. (a) Axiomatic motivation and introduction of Shannon capacity to measure the strength of conditional dependence. (b) Proposal of novel and consistent fixed k-NN estimators for the novel quantities UMI and CMI. Proving consistency for fixed k-NN estimators is itself a huge undertaking even for relatively simpler quantities such as entropy. Thus proving consistency for fixed k-NN estimators of the novel quantities UMI and CMI, which include optimization inside estimation, is a technically important result on its own right -- and our proof techniques are of independent mathematical interest. The main text in the present draft is skewed towards our first contribution and the second one is mostly in the supplementary materials, which we will fix in the final version. In retrospect, we feel that this imbalance has caused some confusion of the relative importance of the two aspects to the readers. Response to Reviewer 3: 1. Your comments on our usage of the word "causality" being confusing/inappropriate is well taken (see above). However, we point out that even when a causal relationship is known and confounding ruled out, measuring the strength of a causal relationship is non-trivial and of immense interest (we point out the paper by (Janzing et al. 2013)). 2. Notation of P_U and U_X is confusing/overloaded; we agree and will fix this. 3. Re (10): for continuous valued X and Y, some form of regularization is needed -- otherwise CMI is infinite. We suggested L2 regularization, but in our experiments on real data, we found L1 regularization works just as well too. Response to Reviewer 4: 1.Justification of Axiom 3: this axiom models the intuition that the stronger a causal relationship, the more we can change the effect by changing the cause. This is perhaps the most important axiom in our list. Re notation: convex hull of distributions q1 and q is simply all distributions of the form a . q1 + (1-a) . q2. 2.\rho_{k,i} is defined on line 449. 3.Additional regularization of w_i in line 559: see our response to Reviewer 3. 4.The main attraction of our estimators is that there are hardly any parameters to tune: k=5, h_N = 0.2 (with data normalized to unit variance – standard choice for KDEs) and a is found as the empirical average of the square of the samples of X. Response to Reviewer 5: 1. Much of the literature on k-NN estimators is in the regime where k grows with N (k \geq log N for the paper suggested by the reviewer): in practice actually tuning k as a function of N is hard and systematic choices are not known. Our focus is on fixed k, typically set to 4 or 5. 2. CMI is computed using a convex program and therefore all local minima are global. We used standard adaptive gradient descent and found no difference in convergence over multiple runs. 3. There are two different experiments in the real data set (Figure 1), both of which were the central results of (Krishnaswamy et.al. 2014). We have elaborate synthetic experiments that didn’t seem to add any extra insight and we omitted for brevity. Response to Reviewer 6: 1.We agree that we didn’t get to explain the actual experimentation setup and will address this for the final version. 2.Axioms 3 and 4 are not related to causality per se, but to the strength of a known causal dependence relationship (elaborated earlier). 3. Re point (d) in Section 5: we included this result for completeness – finding the direction of causal relationship is not the focus of this paper. Indeed, reversing the decision would give higher than 50% accuracy but could also be overfitting to the data sets. Indeed, some causal direction estimators in the literature also have accuracy below 50%.