We would like to thank the reviewers for their helpful comments and feedback.$ Regarding the comments from Reviewer_5. Concerning the clarity of the exposition in the paper, we will aim to clarify sections where possible and to add one or two figures corresponding respectively to the PixelCNN and the multi-scale model. Concerning the result tables in the experimental section: - For CIFAR-10 we added a new baseline result, the RIDE model with 3.47 bits/dim (Theis & Bethge, 2015), which we got from private communication with the authors. This result shows that the proposed models significantly outperform the most related model from previous art. - For MNIST we will add results for the PixelCNN and Row LSTM. - For ImageNet we saw that bigger networks always gave better results, so we reported the best results we could get given GPU memory constraints and reasonable training time (This happened to be the Row LSTM as the BiLSTM uses more memory). Regarding the comments from Reviewer_7: - Concerning the comparison with other models, we are able to make a fair comparison with previous density models, because the density models add uniform noise to their image values. This is explained in Section 5.1. - As mentioned by the reviewer, the model is not fully flip invariant and it might be a good idea to exploit this. One way to do this would be to use flips for data augmentation. However, we decided not to use any data augmentation as this would make it much harder to compare different models. We would also like to stress the significance of this paper for the generative modeling community. We outperform the previous best generative models by a large margin. This is the first demonstration of a model that both achieves good log-likelihoods and generates high quality samples and completions. With this paper we are promoting research on more interesting and challenging datasets such as CIFAR-10 and ImageNet in addition to standard datasets such as MNIST.