We thank the reviewers for their insightful comments.$ @Reviewer 1: The “significantly” is referring to the difference in mean accuracy as shown in Table 1, but we will delete “significantly” to avoid confusion. @Reviewer 2: 1. How does this scale when the input memory gets very large? It's unclear whether this can scale to answer questions on Wikipedia articles (or, even more ambitiously, over all of Wikipedia) --> Great question. We can add an explanation in a “future work” sentence of the conclusion. For answering questions over all of Wikipedia, we could first run a simple and fast IR system to retrieve a subset of documents that seem at all relevant and then run the DMN over that subset. A hierarchical attention module could also be used, and can take advantage of the Wikipedia structure (categories, articles, sentences, etc) to select the most relevant inputs. 2. bAbI is a good first step, but it's still pretty limited. Incremental improvements on sentiment are also nice, but not spectacular. What would make this amazing is to show this sort of memory network can solve a problem that was previously thought too difficult for NN (and solve it better than other techniques). Entailment? QA? --> Great idea, we plan to tackle more complex and large scale question answering problems in future work. @Reviewer 3: We will make the novelty of the episodic memory module clearer. We like the “two strings mapping to one string” type of description and will add that the model is essentially a sequence-to-sequence model that can be conditioned on an additional sequence (the question) and has the ability to selectively attend to relevant inputs conditioned on that additional sequence.