Paper ID: 209 Title: Learning Physical Intuition of Block Towers by Example Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper investigates the ability of deep neural networks to learn to predict the outcomes of simple physical experiments. They do this by considering a setting where the task is to predict, from raw pixels, whether a stack of blocks is stable or not. They simulate the blocks in Unreal Engine and predict features of the outcome state from images of the initial state. They also explore the models' ability to generalise to taller stacks and to real images. Clarity - Justification: Great paper, easy to read. Figure 6 is a little unclear and could use a bit of work on its visuals. Significance - Justification: I applaud the paper's motivation and setting, however the central themes of this work were originally introduced in Battaglia et al. (2013). The paper's contribution is to consider direct prediction (via a DNN) of the outcome without making use of a physics engine in the prediction process. There is little novelty other than to consider this setting. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper is very well written, it considers an important and interesting question, and does so in a thorough manner. The technical contribution is slight, however the paper moves the discussion forwards and the public release of PublicTorch is a nice touch. For this reason I think the paper is suitable for publication at ICML. Table 1 is very interesting, showing that pre-training on ImageNet makes it easier for the model to transfer from simulations to real images. The text in Figure 6 is difficult to read (too small), and the charts on the right print strangely. The green dots are not very visible. The authors may want to look into spending a bit more time on this figure. Do the authors have any comments on the relationship between stochasticity of the physical task, and the need for simulations in the prediction process? As the tower grows taller, can we reasonably expect a deep net to capture all of the possible outcomes of the fall? ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The goal of this paper is to develop deep feed-forward models to learn intuitive physics. Using the Unreal Engine 4 (UE4) 3D game engine, the authors create small towers of wooden blocks whose stability is randomized and render them collapsing (or remaining up-right). They use this data to train convolutional network models which can accurately predict the outcome of these experiments, and also able to estimate the block trajectories. Two authors investigate two tasks: (i) will the blocks fall over or not? and (ii) Where will the blocks end up? They estimate stability directly from image pixels and the prediction model is learned from examples. The authors compare several deep neural networks on synthetic and real datasets and they also compare the prediction accuracy of their models with humans. Clarity - Justification: The paper is well-written and easy to follow. Significance - Justification: The paper is not a typical paper that one might see at ICML, but I really like the application in this paper. This work is a nice demonstration that neural networks can go beyond object classification/detection and might be promising to understand our environment and answer questions such as “What will happen next in this scene?" Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): It is also a strong point that the authors integrated the Torch machine learning framework into the UE4 game loop, and the authors made this software package publicly available. The studied problem is important and interesting. It is a pure application paper; there are no significantly new methods proposed or developed in the paper. The application, however, is interesting. Weak points: The authors could also have integrated other machine learning methods into their framework. Many other ML method could have been used for the studied classification and regression problems, but the authors only focus on convolutional neural networks. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper uses a ConvNet to predict the stability of a block tower configuration, and to predict where the blocks will fall if the configuration is unstable. Synthetic training examples are generated using a modification of the Unreal gaming engine. The trained network is tested on both real and synthetic examples and achieves good performance. Experiments on the trained network provide circumstantial evidence that non-trivial reasoning about the block configuration is being performed. Clarity - Justification: The paper is written well. Significance - Justification: The domain is fairly narrow, but the work is stimulating. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This is a competently executed work and I recommend acceptance. However, the ConvNet is still a black box and it's hard to conclude something from this work that is of broader interest. It is nice to know that ConvNets can do this, although perhaps not surprising at this point. - lines 122-124, "the lower layers perceive the arrangement of blocks and the upper layers implicitly capture their inherent physics". This is speculative and is not substantiated by the experiments. I recommend toning this down. - I didn't see any mention of a validation set. The synthetic data seems to have been split into a training set and a test set. How was the architecture and the hyperparameters tuned? Did anybody monitor the test set performance during the design of the network? If so, it's not surprising that it works so well on the synthetic data. =====