Paper ID: 1078
Title: Estimating Cosmological Parameters from the Dark Matter Distribution

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper presents an application of convolutional networks to infer the parameters in a cosmological model. One particularity of the model is that it is only available through stochastic simulation of data given parameters. 

Clarity - Justification: The paper is well written, although structured like a physics paper, which might be unusual to an ICML audience. I think this won't be an issue once proper references to Section 3 will be added to Section 2.

Significance - Justification: The paper makes an interesting and detailed case for using state-of-the-art ML in experimental physics, this kind of contribution is always welcome. My main concern is the application seems an ideal case study for an ABC approach (see below), which would also provide some measure of uncertainty over the fitted parameters, while I am not sure how deep nets would provide that. It would be interesting to at least have ABC in the experimental comparisons. 

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): # Major comments - L95 please give an accessible reference for details on LambdaCDM. - L184 please give a reference to the "standard maximum likelihood approach", for example to the part of Section 3 where you explain it. This way of structuring the paper (results then methods) is unusual in ML, so I think it is worth mentioning that you later describe it. - Section 2: could you possibly apply ABC (approximate Bayesian Computation; see e.g. [Marin, Pudlo et al., Approximate Bayesian computational methods, Statistics and Computing, 2012])? If you know what features of the spatial distribution you want to reproduce, this should give you appropriate summary statistics. You would also get some proxy for a posterior distribution, which would help quantify uncertainty. In this application where you only have a handful of parameters, ABC with importance sampling would probably work well and be easily parallelizable. - Figure 2: the power spectrum approach seems to be biased at large Omegas. Do you have intuition why? What measure of uncertainty do you have on your estimates? It would be interesting to see a comparison to ABC posteriors here. - Figure 3: please add axis labels and colormap. - Figure 4: do you have an intuition why the two approaches seem to have opposite biases? - L389 The two first paragraphs detailing the max likelihood approach could be clearer. In particular, I think defining the likelihood through a formula would help clarify the text.   - L401 why these particular parameters?  # Minor comments L211 missing parentheses L213 missing word after "error" L398 \hat{P}_m(k) is undefined 

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper describes an application of deep 3D convolutional neural networks to estimate cosmological parameters from simulations of the expanding universe. The model is evaluated on two different types of simulations, used to predict two different cosmological constants, and compared to other models. The experiments show modest improvements over existing methods for fitting cosmological parameters.

Clarity - Justification: This is a well-written paper.

Significance - Justification: While there is nothing new from a machine learning perspective, this is an interesting application of 3D CNNs. I am convinced that this could be an effective approach.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Deep learning is useful for learning multiple levels of structure -- is there any high level structure here? The Fourier representation used in analysis in 3.2 does pretty well, and this is a relatively shallow method; one could train a shallow convolution network that will presumably learn the same types of features.  There are only 2 kernels in the lowest layer of the network, with 3^2 parameters each. Have you looked at the learned kernels?  Some things that I wondered about:  How long does each cosmology simulations take?  Does the CNN approach have an advantage/disadvantage compared to the other methods in terms of training data? 

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): I found this to be an intriguing paper and an exemplar for applications of ML in a scientific domain. Estimating cosmological parameters is an intrinsically fascinating topic. The field is growing increasingly sophisticated. While classical procedures like maximum likelihood estimates on power spectra might seem old-fashioned, in the context of cosmology there is a much stronger justification for their usage. This makes it all the more difficult to introduce novel methods. This paper makes a strong case that such improvements over classical techniques are possible.   The techniques themselves will be interesting to the ICML audience. The technical details of the 3-d convent are interesting. The physical intuition attached to the filters is interesting, and the comments about max-pooling vs. average pooling are particularly interesting because one can see how the physics problem might be different from image classification etc. and really prefer average pooling 

Clarity - Justification: I thought the text was quite clear for both the physics content and the ML content. I was slightly confused at times about the pairing of the two examples, methods, and results; however, that was fixed with more careful reading.   One place where I thought the clarity could be improved was in the discussion of the average relative error in the results section. There are comparisons of two approaches, and it's not clear what the uncertainty on those numbers is. That seems to be addressed in subsequent paragraphs, but it is a bit hard to follow since one is talking about uncertainty on an uncertainty. After re-reading a few times I guessed that I knew what was meant, and moved on.

Significance - Justification: The significance for cosmology is quite high. It's work on the way to a more ambitious program, but a good step in that direction.. On the ML side, the usage of 3D conv-nets is interesting. The insight about max/average pooling is nice. The details of constructing large training data by sub-sampling large expensive simulations is also interesting (and I suspect will eventually become more of a point for scrutiny in the future for this approach).

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): minor points in the text: -- for non-physicist, clarify redshift z=0 means now, large z is long ago / far away -- "shooting gradients" could use some more discussion -- "reproduce the statistics of large scale structures" -- jargon -- "intend to go beyond this measure." -- unclear  would be nice to show the covariance matrix for the results, particularly given the strong degeneracy of \sigma_8 and \Omega_m for the power spectrum analysis.

=====