Paper ID: 1242 Title: Control of Memory, Active Perception, and Action in Minecraft Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors propose a new deep RL benchmark (based on the Minecraft game) and present a variety of neural network policy architectures that incorporate memory. The results convincingly show the improvement attained with the proposed architectures. Clarity - Justification: I appreciate the thorough explanation of the architectures and the discussion of the implementation. Significance - Justification: The authors demonstrate substantial improvement over prior work on an interesting new benchmark task. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Overall, this is quite a nice paper. It makes two valuable contributions to the fields of reinforcement learning and deep learning: 1. An interesting new benchmark that is intermediate in difficulty between Atari games and continuous tasks that thoroughly exercises the need for memory. 2. Network architectures for incorporating memory into deep RL and an empirical evaluation to convincingly demonstrate an improvement in generalization from adding memory. The empirical results are quite convincing and the improvement in generalization is impressive. I also like that the authors compare training and test environments. If there is a weakness in the work, it is that novelty is not all that high. There are a variety of ways of incorporating memory into neural net policies, and while the proposed approach appears to be quite effective, it is a somewhat incremental change to existing methods. However, on the whole, I believe that this paper will be interesting to a substantial portion of the ICML audience. Some additional papers that should be cited/discussed: Asynchronous Methods for Deep Reinforcement Learning (Mnih et al.) [very recent, so may have been missed] -- addresses a similar maze task Learning deep neural network policies with continuous memory states (Zhang et al.) -- addresses memory End-to-end training of deep visuomotor policies (Levine et al.) -- addresses neural network policies from images Embed to control (Watter et al.) -- addresses latent variable models that could in principle incorporate memory Deep autoencoder neural networks in reinforcement learning (Lange et al.) -- addresses neural network policies from images and an embedding mechanism that could in principle be extended to incorporate memory ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors introduce and evaluate 3 architectures to extend DQN with a form of memory. The architectures are evaluated using various maze-based tasks in Minecraft. The tasks are challenging because they combine delayed rewards with partial observability and a high-dimensional visual input. In the experiments, special emphasis is placed on how well the architectures generalize by creating variations of each tasks and dividing task-instances in test and training sets. Clarity - Justification: In general, the paper is easy to follow. Certain sections, like the part describing the memory and the controller were a bit harder to follow without specific background. Especially why certain choices for controller or read-function were made was not fully clear to me. Significance - Justification: The paper is very relevant. It addresses one of the open questions in deep reinforcement learning: how can memory be used to deal with partial observability. Minecraft looks like a great platform for testing new strategies because it provides a very controllable 3D environment. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I enjoyed reading this paper. The experiments are set up well, with a training set and test set to evaluate the generalization properties of the architectures. And Minecraft seems like a great framework for testing algorithms, because it provides high-dimensional visual input, but still allows for creating very specific tasks (as opposed to for example the Atari framework). As a small critical point: I don't think the architectural choices were particularly well motivated. I am sure the authors have good reasons for why particular choices were made for the architectures. It would have been nice to learn a bit more about those reasons. For example, an analytical discussion of potential weaknesses/strengths of each memory-strategy would have been nice, before going into the experiments. Overall, I think this is an original and relevant paper, hence I recommend acceptance. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper introduces a novel Minecraft task as an alternative to the Atari Games for studying reinforcement learning algorithms. A memory-based reinforcement learning architecture is proposed for the task, which addresses the external memory based on temporal context. The architecture is essentially used as a function approximator for the Q-function. The learning is performed in a standard manner. Clarity - Justification: The paper is well written. Significance - Justification: See cmments below. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The main contributions of this paper are the introduction of the new Minecraft task to the RL community and the new memory-based RL framework. The paper is in general well written and the experiments show the advantage of the proposed memory-based algorithm over the baselines. The unclear part is whether the Minecraft task or the memory-based RL framework is claimed as the major contribution. If the latter is the case, then the proposed method should also be evaluated on the standard benchmarks such as Atari. Furthermore, it is also important to see the influence of the external memory size on the performance. =====