We appreciate the detailed comments and suggestions.$ ***R1*** The primary concern was the novelty/value of the propositions. Proposition 2.1: The purpose here is to set the stage for comparison with subsequent results. Note also that there are many cases where VB can in fact produce quite different results from MAP. Proposition 2.2: The variable transformation from VB-BG to VB-GSM is only briefly and qualitatively mentioned in Section 4.1 of Nakajima et al. (2013a) in the context of fully observable models, i.e., where every element of X is directly observed with additive Gaussian noise. The NP-hard partially observable situation (to which our entire paper including Proposition 2.2 is devoted) is considerably different. Propositions 2.3 and 2.4: The acute sensitivity to dictionary correlation that we quantify is a significant and previously unobserved limitation of VB-BG. The reviewer suggests that forcing Sigma_B to be diagonal this will fix the problem, but this heuristic provably just renders VB-BG equivalent to VB-GSM. Therefore VB-GSM does maintain its advantage over VB-BG when the latter is deployed in its standard, commonly-used form. Incidently, the sensitivity of VB-BG to a change in coordinate system is not a common attribute of sparsity algorithms. For example, consider solving min_x f(x) s.t. y = A*x, where f(x) is some sparsity penalty. We would hope/expect that the solution will remain unchanged if we redefined the feasible region via an invertible transform W giving W*y = W*A*x. For the Lasso, VB-GSM, and most other methods nothing changes; however, for VB-BG we will get a radically different solution as we have elucidated with Propositions 2.3 and 2.4. Proposition 2.5: While it is true that the effective prior over x is equivalent to the Lasso penalty, this is very different than showing that the VB-BG estimator itself exactly converges to the Lasso objective in the strongly informative limit. If this has been previously shown in the literature, we would be eager to see a specific reference so we can cite it. Proposition 3.1: The reviewer mentions that this result is not intuitive. However, we note that below the formal statement, in lines 640-678, we provide an intuitive argument for why this result is both important and directly related to existing methods. In brief, we demonstrate that VB-BG collapses to what amounts to a basic constrained affine rank minimization problem that is characterized by numerous local minima. This provides the first concrete explanation for why it is not competitive with VB-GSM for this type of problem. To conclude, as we form more complex models upon the building blocks of sparse and low rank terms as in eq. (1), both the sensitivity to dictionary correlations and local minima from affine low-rank terms will indeed be inherited, which speaks to the broad applicability of our results. Therefore we make a strong case that for optimal estimation quality VB-GSM may be preferable, while for computational efficiency VB-BG has an advantage because it can more directly exploit prior information about rank. We have not seen this distinction quantified previously for the important cases we describe. ***R2*** 1) Sensibility of VB approximation: Indeed this is an interesting question, with many subtle viewpoints. For example, in the case of sparse estimation with overcomplete dictionaries, the VB approximations never actually converge to the true posterior as both dimensions grow proportionally. However, VB-GSM does provably smooth out bad local minima allowing maximally sparse solutions to be found, better than any algorithm we are aware of when correlations are present in the dictionary. Therefore model justification is really based on properties of the objective irrespective of its proximity to the full model with which we started. Regardless, the stated reference provides a nice complementary perspective that could be woven into this analysis. 2) Convergence issues: When applied to the proposed sparse and low-rank models, the global minimum of VB-GSM coincides with the maximally sparse, lowest-rank solutions. Of course finding such solutions is NP-hard, so we cannot guarantee that VB-GSM (or any possible algorithm) will always succeed. However, with some effort VB-GSM can be implemented such that convergence to a local minima (or general stationary point) is reached, and VB-GSM does often provably have fewer local minima than its MAP counterparts. Finally, thanks for the reference to Rohde & Tsybakov (2011), this is a helpful detail for contextualization. ***R3*** Please see comments above that relate to the significance of our results, which are key in deciding which form of VB to use. However, please note that we never claim that low-rank models will be sparse, nor that there is somehow overlap between the two. Rather we consider models composed of different sparse and low-rank factors.