Paper ID: 500 Title: Deep Structured Energy Based Models for Anomaly Detection Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The idea of this paper is to take a neural network (fully connected, CNN, RNN) and have it output an energy function as opposed to classification or regression. This is a flexible density estimator which can then be used for anomaly detection. When the energy function includes a quadratic term on the input variables, then the learning problem can be cast as learning a (denoising) autoencoder, where the gradient of the energy with respect to the input variables produces the reconstruction. This particular idea is old, however the combination with deep architectures is new as far as I’m aware. After training the model, anomaly detection can be performed by thresholding the energy function or the reconstruction error. Clarity - Justification: For someone with the appropriate background I would say this paper is fairly clear. The one part I didn’t quite get was the following quote from line 468: “However, x2 is a false positive example under the reconstruction error criterion, which energy correctly recognizes as an outlier.” According to Figure 1, it seems like the reconstruction criterion correctly labels all three x’s, while the energy labels x2 as a false positive, not the other way around. Maybe I’m misreading this? Significance - Justification: The main contribution of this paper is using deep, structured models to produce energy functions. It’s a straightforward combination of ideas, but one that makes a lot of sense and could be useful in a number of different areas outside of anomaly detection. The results on unstructured data are mixed, although the DSEBM does perform well overall. The real win is on structured data, where the underlying representation learned by the EBM can exploit prior knowledge of the underlying data. I'm not up to date on the state of the art in anomaly detection so I can't say for sure whether this paper compares to the state of the art approaches. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This is a simple combination of existing ideas, namely deep EBMs, score matching training, and different kinds of neural networks, into a cohesive framework for anomaly detection. There are still some concerns that I have with the approach: 1. Score matching is local in nature, which means that although it is asymptotically consistent, in the finite-sample regime there could easily be regions of the energy landscape that assign low energy to non-data. This is a concern for anomaly detection, since it would be easy to construct anomalies that go undetected (see the work on adversarial examples). 2. The cost function is defined using the gradient of a neural network. This could be susceptible to vanishing gradients, making the EBM difficult to train. How does depth affect the performance of these models? 3. How were the hyperparameters tuned for these models? How sensitive are the results to these choices? 4. Score matching assumes continuous inputs, so this does limit this approach, although maybe the autoencoder scoring rules paper of Kamyshanska and Memisevic from ICML 2013, could help. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The main contributions are- - Extensive experiments that compare EBMs with multiple baselines across many types of datasets for anomaly detection. - Discussion on using the energy vs reconstruction error (gradient of energy) as the criterion for thresholding. The paper empirically validates that energy is indeed the better criterion. Clarity - Justification: The paper is well written with clear explanations and analyses. Significance - Justification: This paper is significant mainly because it does extensive experiments. These experiments show that deep models can be used effectively for this task. Having these numbers around will be very useful to measure future progress on anomaly detection. The models are essentially autoencoders with tied weights for the encoder/decoder, so there is not much novelty from a modeling perspective. However, formulating them as EBMs trained with score matching is important because that allows us to define an energy and make the case for choosing that as the criterion to threshold on. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The main strength of this paper is the range of experiments. Comparisons are made against many baselines. Different kinds of datasets, that require different kinds of EBMs are experimented with. The only weakness is that it lacks novelty, in the sense that training EBMs with score matching (=autoencoders) has been studied previously (for example Swersky et al., 2011). Generalizing to convolutional or recurrent autoencoders is not a big leap, given that autoencoders for these structured models have been explored as well. However, using these models for anomaly detection is a novel application and the experiments are very compelling. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a collection of novel deep energy based models for the problem of anomaly detection, by using EBMs to model the data distribution. The authors propose model architectures for three scenarios: 1. fully connected deep networks for standard setting; 2. recurrent networks for sequential data; and 3. convolutional models for spatial data. In addition to the model architectures, the authors also propose a novel training algorithm, which is an adaptation of the recently proposed score matching algorithm. The benefits of such an algorithm is that it connects the energy based models to regularized auto-encoders, resulting in an end2end training algorithm which does not resort to sampling. The authors do a thorough comparison of their model against a variety of baselines models being used on a number of datasets belonging to the above three categories. The experimental results show that the proposed models are superior to the previously proposed baselines in most scenarios. Clarity - Justification: The paper is very clearly written and easy to understand. Though I'm not quite sure how easy would it be to reproduce the results in the paper. It would be nice if the authors could also open source the accompanying code. Significance - Justification: Though I'm not an expert in the area of anomaly detection, I think the ideas proposed in the paper are quite novel for the problem. Given the extensive experimental comparison against a variety of baselines on a number of datasets, the proposed model and the subsequent claims of its superiority are quite sound. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper proposes a novel application of energy based models, namely anomaly detection. To the best of my knowledge I've not seen any prior work in anomaly detection which uses deep energy based models. In addition the authors also propose a novel end 2 end training algorithm for training such models, which is an adaptation of the score matching algorithm. Minor comments: -- Line 269 - 270: "This is a significant generalization of EBMs ... ". The EBM framework does not make any assumption on the architecture of the underlying model other than the fact that the energy function should be differentiable wrt the parameters. Stating that the proposed models are a generalization of EBMs is a bit misleading. -- Line 518: prevision --> precision ? =====