We thank all reviewers for their helpful and detailed comments.$ Reviewer 1 ---------- Thanks for the useful suggestions, and we will certainly make an effort to demonstrate more clearly the intuitions of the constructions, as well as limitations. We also fully agree with the reviewer's point about over-specified networks and generalization error, and will point it out more explicitly. Reviewer 2 ---------- Thanks for the comments. We agree that some of the results make strong assumptions, but others are more generally applicable, such as the path existence theorem in section 3 which applies to any ReLU network, and lemma 2 which sheds light on the benefits of overspecification for 2-layer networks. Regarding the singleton data, we completely agree that such data is uncommon in practice (as we also mention in the paper). Our goal was merely to provide an interesting brittleness example for constructions such as Auer's. Reviewer 3 ---------- Thanks for the comments, below are responses to some specific comments: Why is the second bullet of definition 1 necessary? - It guarantees there cannot be suboptimal local minima. For example, consider the following scalar function, which satisfies the first bullet but not the second one: f(x)=|x|, for x<1, and f(x)=1 for x>=1. line 405: what would it mean for there *not* to exist a continuous path? does this happen only when the activation weights are constrained? - Thanks for pointing this out, we were indeed not specific enough. We meant a continuous path which *also* satisfies the assumptions stated in the theorems. We will fix accordingly. line 471: it would be nice to have a geometric picture or interpretation of this condition. - Thanks for the suggestion, we will further clarify this condition. line 643: it might be nice to define the basin value closer to the first place it is used in the paper. - We tried to define this when it is first used (at line 351). However, if we missed an earlier reference then please let us know and we will fix accordingly. Theorem 4: clarify that the theorem holds for any singleton distribution; the expectation is over the randomness in the initialization - We're not 100% sure what is meant here. Perhaps the reviewer refers to theorem 2 rather than theorem 4? Anyway, we will clarify. line 816: is it true that Bas(W,v) \leq \alpha iff the basin around W and v contains a global minimum? You might choose notation to make this clearer; for example, using p^\star in place of \alpha. - Since the global minimum depends on the architecture and number of neurons, we didn't want to imply it's a fixed quantity independent of the architecture. However, we will definitely try to use a more suitable notation. line 841: the quantifier \exists is not neccessary - Thanks for catching this, it will be removed.