We thank reviewers for their comments. We share enthusiasm that our work is a “noticeable contribution” (R1) for distances, barycenters, & alignment between heterogeneous matrices (R6), forcing no specialized structure on the input (R7). We address key points below.$ ##R1## Q: compelling advantages Since there exists no other method able to average unregistered similarity matrices of varying size, it's not exactly fair to talk about "advantages". The usefulness of our algorithm itself is demonstrated experimentally on sophisticated tasks. Q: computational burden That burden is shown in Alg.1: We only use matrix products, there are no hidden big-O constants. Each iteration (l.498) requires, for each s, K matrix-vector products in l.507~509 (non-regularized approach would be at least cubic) + 1 matrix-matrix product (l.504, a naive approach for c_s would be quartic). The total number L of iterations is small, & so is K (~10) thanks to warm restarts. Q: using Wasserstein barycenters Wasserstein (W) barycenters solve a fundamentally different problem: They can only average registered distributions defined on the same space. Minor changes in inputs (e.g. rotations) result in radically different W barycenters. All examples in Fig1&2 are *failure* modes for W barycenters, hence we do not agree that W barycenters could yield "much more impressive results", certainly not on these examples. Q: comparisons with Dryden et al (2009) This reference does not appear in Rupp’12/Hansen’13. Please provide a detailed pointer. If code is available, we will compare. Q: applications in the field of domain adaptation with kernels This is indeed a nice application! Q: generalizing some previous barycenter method? No. Although GW can be applied to datasets for which W barycenters can be used, GW does not reduce to W, even when adding additional assumptions. Q: become simpler if all ps & p are assumed to be uniform? No. The main difficulty is that inputs aren’t registered, requiring joint assignment to the (unknown) barycenter. Q: 2. L256-257: I found the discussion confusing Indeed. L.257 can be removed. Q: 3. Eq. (8) better motivated Indeed, and we will follow your advice. Q: 4. typical runtimes For quantum chemistry, the process takes ~15 min on 16 CPUs. Other tasks each take 2 minute or less. ##R6## Q: significance of this line of research While GW appeared in 2007, entropic regularization is a uniquely viable way to make it practical. It dramatically accelerates all computations & enables solving advanced problems such as barycenters. Q: 1. previous method can do similar job? Unlike ICP, GW is well-suited to a range of deformations *and* works well for derived problems like computing barycenters. Extending ICP to barycenters is nontrivial. Q: 2. visualization of barycenters Our general method isn’t best for /all/ possible classes of shapes (see answer to R7 about shape space). It is, however, extremely general & numerically efficient. We will add discussion about shape-space methods. Q: 3. more general transformations? We *never* hardcode 2D rotation: rigid motion is the natural invariant for Euclidean matrices, but GW is robust to nonrigid deformation (this is the purpose of our 2D tests). We experimented on 3D surfaces using geodesic distances, with similar successes/failures as 2D; we’ll showcase some examples. Q: 4. alignment results? Color coding (green->red) visualizes registration to the barycenter. We will explain & add an example of registration from the T_s’s. Q: 5.The results in Table 1 are not promising We disagree. In natural sciences, good accuracy with simple tools is sometimes preferable to better accuracy with over-parameterized approaches. Our approach (3-NN with GW metric) is very simple. ##R7## Q: a strong argument made for the utility of this regularizer As mentioned to R1, our method is orders-of-magnitude faster, parallel & simple to implement, unlike classical (Hungarian, network flow) solvers. We inherit these advantages from (Cuturi'13). Q: matrices of different rank Prop4 shows that it maintains PSD, but the barycenter of low-rank similarities may not stay low rank. If rank is crucial (e.g. for storage), we suggest adding a variational rank penalty, and then using a first order scheme to solve problem (12) (e.g. Frank-Wolfe for trace-norm). Q: The example with MNIST works, but then so do many things We will discuss works on Frechet means in shape spaces. These are specific to smooth shapes, use a stronger metric, & aren’t intrinsic (see comment about ICP to R6). So, their barycenters would not be comparable. Q: how to interpret the results on the molecular data As mentioned to R6, among all methods, ours is very simple, has *no* parameter, and yet, it is accurate. ##R8## Q: could it be combined with alternative method? Indeed. our goal was to demonstrate the effectiveness of the metric, naked.