Paper ID: 22
Title: A Deep Learning Approach to Unsupervised Ensemble Learning

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes using a Deep Belief Network (stacked RBM) in order to discover how to optimally combine classifiers without extra training labels

Clarity - Justification: The overall flow of the paper was nice --- especially starting with Dawid and Skene, which makes the idea more comprehensible.

Significance - Justification: While the original approach of Jaffe (2015) was the first I had seen of trying to explicitly model correlations between classifier outputs (without ground truth), this paper is much nicer. Here, a deep belief net gets to choose its own depth and structure to best model the classifier outputs.  Also, there are not that many stacked RBM papers that yield state-of-the-art results on tasks and data sets that people would care about. Good job!

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Overall, a very nice paper.  You may wish to point out that, for the DREAM data, the proposed method performed as well as (Jaffe, 2015) exactly in the case where you would expect: when it chose a 2-layer RBM.  I understand the plot on the Magic data, but it's a little unsatisfying.. Is there some sort of frequentist test you can do to show that these improvements are not fluctuations? For example, I would find it very convincing if you run McNemar's test on different ensemble methods (see [1]) and found that the stacked RBM is statistically significantly better than the other techniques. You may wish to lump all of the datasets together for a single McNemar test.  Somewhere (in the Appendix?) it would be nice to list the hyper-parameters that performed best on the task (Did you tie the hyper-parameters for Dream and Magic?)  Were the hyper-parameters tuned on the log-likelihood of a holdout set? I would be concerned about the integrity of the results if not.  References: 1. Dietterich, Thomas G. "Approximate statistical tests for comparing supervised classification learning algorithms." Neural computation 10.7 (1998): 1895-1923.  

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper proposed to use a stack of RBMs for unsupervised ensemble learning tasks. More specifically, it proved the relationship between RBM and the DS model in the conditional independence case, where the RBM had only one hidden unit. To handle the situation where the conditional independence assumption does not hold, the paper proposed to use a stack of RBMs. The paper also discussed how to setup the number of hidden units for the stack. Experimental results on some simulated datasets and real world datasets shows the advantages of the method proposed in this paper.

Clarity - Justification: The English of the paper is good. And the paper provided enough details to reproduce the experiments.

Significance - Justification: The paper shows that a RBM with one hidden units could be used to estimated the posterior of true labels given the conditional independence observations, which is interesting.  After that, the paper described the motivation to use a stack of RBMs to deal the case where the conditional independence assumption is not true, where the  RBMs from layer 1 to layer n-1 serves as preprocessing block to decoupling the inputs. It would be much better if the authors could mathematically prove that the stack of RBMs with pretraining layer by layer could derive conditional independence features. 

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This paper shows the relationship between RBM and DS model and provides a simple method that uses a stack of RBMs to decouple the dependence of the input for unsupervised ensemble learning. Although the paper has provided comparable results of both simulated and real dataset, the lack of mathematical foundation about why simply stacking RBMs could decouple the features reduce the impact of the paper.

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper introduces RBM based DNN to for ensemble learning. It proves that one-layer RBM can model the joint probability P_(X, Y) when the conditional independence holds. To handle the case when conditional independence doeThis manuscript also give a heuristic way of determining the structure of network.

Clarity - Justification: The presentation is clear and it should be easy to reproduce the results since it's using stacked rbm.

Significance - Justification: It uses stacked RBM to do unsupervised aggregation. It may be of interest to many machine learning researchers. However, there is no theory proof on the selection of structure of DNN, which is very critical for unsupervised learning. 

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This manuscript introduces stacked RBM to do clustering. The idea is interesting and the approach is reasonable.   For unsupervised learning, the structure of network is very critical. It is good to have a heuristic way to determine the structure, but it is still not convincing enough. Note, if the DNN is deep enough it can easily overfit the distribution. For example, it can assign all samples with a same label. Thus, the DNN should be regularized (e.g. autoencode has a small layer in the middle).  Overall, proposing a way to determine the structure of stacked RBM based DNN for clustering is interesting, but there might need more work to show this approach is valid. 

=====