We thank all three reviewers for their detailed and insightful feedback. We will fix typos and apply suggested minor changes and recommended clarifications.$ Reviewer 1 ========= There are a number of other Bayesian models for matrix factorization that the authors should consider citing… A: Thanks for the pointers, we will add this reference. The authors should also expand the discussion of the hardness of the general problem with appropriate citations. A: We do discuss the NP-hardness of matrix factorization (see line 294). This result extends to matrix completion in general as it contains matrix factorization. Even though the convergence of BP appears not to be an issue, the authors should consider convergent alternatives to BP for the MAP inference task (such as TRW, MPLP, max-sum diffusion, etc.) as they come with additional performance guarantees and often work well for MAP inference tasks. A: We agree that this is an interesting direction! However, when BP converges it is known to produce more accurate estimates than its convergent alternatives. Also note that our factor-graph is not an MRF, and simplification of factor-to-variable messages for our high-order potentials may prove non-trivial for some of these convergent alternatives. Reviewer 2 ========= (I would add to these Matsushita, R., & Tanaka, T. (2013). Low-rank matrix reconstruction and clustering via approximate message passing). In fact the cited algorithms are written for a class of observations channels that include both (1a) and (1b) and for general priors on the elements of X and Y, including Boolean. A: Thanks for the pointer! This work differs from ours in several ways, as it is defined in the real domain, with L2 loss and is a special case of more general treatment of Kabashima et al.’14 (in our references) that uses AMP for a broad range of problems. Although these have similar expression for posterior, their operation domain, factor-graph and message update is different from ours. We will cite this in our related works section and better emphasize the differences. Why do the authors compute MAP of the posterior instead of computing the marginals? A version of the algorithm for marginals in the appendix, but what motivates their choice of one over the other? They give an answer in the appendix, but I am not satisfied with this. If there is a strong ground truth factorization that the marginals should be biased towards it and no decimation should be needed. On the other hand if marginals are not information wouldn't this suggest there is actually no significant "signal" in the data? A: If there is a strong ground-truth factorization, it would still be permutation invariant (see line 275). This symmetry necessitates using decimation when using marginals rather than MAP, which in turn increases the cost of message passing. In addition to this, we found that sum-product BP often has a hard time converging even for factorization problems. We speculate that this behaviour is due to the large number (i.e. K!) of symmetric solutions (corresponding to ground states) that generally have a large Hamming distance. It should be clearer in the introduction that the authors do not provide nor test the algorithm for the XOR problem (1b), only on (1a). A: We will emphasize this in the revised version. Reviewer 3 ======= ...one can speed up BMF by using graphical model inference. Conversely, one can speed up graphical model inference by using BMF, see Van den Broeck, G., and Darwiche, A. (2013). A: Although it seems to concern “lifted” inference, this connection is very interesting. We will point this out in our related work section. Thanks for the pointer.