We thank all reviewers for helpful comments.$We principally promise to try experiments with different batch sizes and with a Neural Turing Machine as a baseline. Response to suggestions by Assigned_Reviewer_4: 1) It would be interesting to explore sparse keys. At first glance, it does not hurt. 2) A hierarchical data store is another interesting suggestion. The dimensionality of the key is not a problem as a high-dimensional key can be produced from a low-dimensional vector by a non-linear transformation.