Paper ID: 943 Title: Learning Convolutional Neural Networks for Graphs Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper proposes a method which breaks a graph into "receptive fields" and extracts features from them, in a vaguely similar way to how convnets do this for 2D image data. I think the approach in this paper is solid, seems original to me, and could be potentially useful, so I'm recommending the paper be accepted. Clarity - Justification: The paper is pretty difficult to read however, due to a somewhat disorganized structure, and I suggest the authors work on improving it. The introduction is particularly hard to follow, and there is lots of jargon throughout that is never properly defined. I had to read the paper at least twice to understand what was actually going on. Significance - Justification: Seems original, but somewhat unnatural and not obvious better than existing alternatives. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Line 111: Once you have these features from the graphs how would you apply a convolutional architecture on top? What is the spatial structure between these features that you are using? Line 311: You never actually define what "canonicalization" means here Line 382: When you say "graph space to tensor space" what do you mean precisely? Do you mean mapping vertices in the neighborhood graph to positions in a vector (i.e. a linear order)? If so, you should explain it like this. If not, please be more precise and avoid using overly general jargon like "tensor". I know tensors are more general than vectors, but do you ever actually map to ND tensors for N greater than 1? Line 489: How can you define a labeling such that it can be applied to any graph of k nodes? Do you mean a procedure for generating a labeling? Or is l defined seperately for each graph in collection? In such a case, it doesn't seem correct to write it as a function of the graphs themselves, but rather their identity or index within the collection. Line 519: The notation appears inconsistent here. A_{\ell(G)} vs A^\ell(G) Line 520: The issue with this result is that it doesn't really use any structure of the problem, and its usefulness is questionable since minimizing \hat{\theta} is still going to be hard, even for a collection of size 2. Line 525: You should explain why this assumption is reasonable. Line 546: Does this "NAUTY" approach use anything like the "unsupervised" objective you allude to in Theorem 2? Line 595: When you feed these graph features into a CNN, do you assume a 1D spatial structure to the data? Based on the index order of the receptive fields? Isn't this somewhat arbitrary? Typos: "the are" at line 212 ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper present an approach based on convolutional neural network for arbitrary graph. In the proposed approach, the nodes of the graph are first put in sequence. Then, a receptive field is constructed, based on the node neighborhood, and then normalized. The CNNs can then be learned on the receptive fields. The proposed approach is evaluated on graph classification and yields competitive performance. Clarity - Justification: This paper is clearly written and well structured. Significance - Justification: Applying CNN on graphs is a good idea, as its hierarchical processing capabilities can lead to efficient and fast processing on graph. The proposed PATCHY-SAN implementation seems solid. The paper lacks comparison with related works (see Detailed comments). On the experimental part, the graph classification study is quite convincing. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The runtime analysis (Section 6.1) performance is quite obscure to me, as there is no baseline to compare with. But the timing information in Section 6.3 are convincing. I think you should cite and compare your work to this paper: Joan Bruna, Wojciech Zaremba, Arthur Szlam, Yann LeCun "Spectral Networks and Locally Connected Networks on Graphs", ICLR 2014 and also to the wavelet-based approaches for graphs, such as: Raif M. Rustamov and Leonidas Guibas. "Wavelets on graphs via deep learning", NIPS 2013 About the CNN architecture performance, the networks used in this paper have only two convolution layers. I'm curious about the performance of the system when more convolution layers are added. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes computing graph based features for graph classification via local receptive fields on graphs, by discovering local graph regions and mapping them to receptive fields of CNNs. After this first stage, the paper claims "The rest of the architecture can be chosen arbitrarily" Ln 521, suggesting that this region discovery is used for one layer only. it shows results on classifying graphs and proofs of complexity required for the mapping process proposed. Clarity - Justification: The paper is not clear. What problem it tries to address? What is the use of one convolutional layer? What if those local regions across graphs do not correspond exactly? The pictures apart from Fig 3 are not very informative, e.g., Fig 1 shows a CNN used in images, not related to this work, Fig. 2 is too generic, Fig. 3 is not clear what green and yellow nodes represent. The experiments are not informative either. It starts from runtime analysis and visualization and then shows classification results. The paper reads as: the graph structure is discovered using standard node labelling techniques, the receptive field mapping using techniques to discover graph isomorphisms and once those are there, shared weights are learnt for each such graph neighborhood. Missing citation:Deep Convolutional Networks on Graph-Structured Data, Mikael Henaff, Joan Brunam,Yann LeCun Significance - Justification: The contribution of the paper is not clearly articulated. I think the contribution of the paper is using learning to compute features based on local graph based "receptive" fields discriminatively for the task at hand, as opposed to using a generative graph kernel. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): There are a lot of things I have not understood in the paper, e.g.: 1)in each dataset for graph classification, are the graph nodes in correspondence with each other? 2)Finding regions -to share weights- on top of the first CNN layer is not important? 3) How were the baseline graph kernels used? In the training set, their statistics are computed and then in the test set graph is classifying according to some regressor or nearest neighbor? 4) How are the node and edge attributes used by the baselines? 5) What if we train a classifiers on statistics of the baseline graph kernels? I am not expert in this area and I would be very willing to change my rating if things clear out during the rebuttal. I believe the comparisons of the paper with images are not informative. It 'd be more informative to describe the baselines that it uses, these are partly described in related work, it 'd be important to mention at the experiments instead or in addition. In Ln 211 is metioned: "In contrast, the our approach learns these substructures from the graph data and is not limited to a predefined set of structures" How is this achieved? Since the graph structure is fixed according to the node labelling algorithm that is chosen? Overall, it seems that the discovery of graph structure of receptive fields is disjoint from the graph classification task, so it is a two step process. Adding weights on top of the predefined receptive fields may not be very important if those structures are not correct or tied to the task. =====