Paper ID: 1299
Title: Partition Functions from Rao-Blackwellized Tempered Sampling

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper introduces a new partition function estimator based simulated tempering (TS). The main idea is to use Rao-blackwellization to improve the estimate of the marginal probabilities over the "inverse temperatures" used by the TS algorithm. The resulting algorihm (called RTS) is shown to have connections with several existing sampling methods. Empirical results on mixture of Gaussians and RBMs show that the technique performs well in practice.

Clarity - Justification: The paper is very well written. The material presented is easy to follow and provides a nice overview of the broader context. 

Significance - Justification: My main concern is that the novelty is a bit limited, and mainly consists in using Rao-blackwellization to improve an existing sampling/partition function estimator scheme. Feels like an incremental contribution. I personally found the connections drawn to many other existing methods very interesting, but because of that the paper ends up being a bit too scattered. 

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The experimental results are nice, although I would have liked to see a more in depth comparison with plain TS, since that is really the method the authors are improving on. Based on figure 4, at K=100 the non Rao-Blackwell estimator TS is still performing reasonably well (better than the others). However, it seems to perform poorly based on Figure 1. Since TS is the most direct competitor to RTS, it would have been nice to have a TS curve in Figure 2. Would it perform much worse than AIS/RAISE?   -- I found the use of hat{Z} and hat{Z}^new a bit confusing. hat{Z} is treated as a constant parameter first, then as a random variable in equations (15),(16).  -- N in eq. (13) should be S I think  -- 466, why is it the minimum and not the maximum?  -- I'm assuming true means ground truth in Figure 2. I'm curious about how was that computed. The models seem large enough to be out of reach for exact inference  -- figures are too small to be readable. figure 3 is not readable if printed in black and white (also figure 3 comes before figure 2)  -- are all tempered schemes in 4.1 using the hamiltonian/adaptive step sizes?

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper introduces an easy to implement and computationally efficient estimator of the partition function derived from the empirical marginals of the temperature distribution in a run of simulated tempering. The experiments do a fair job of demonstrating its accuracy in practice and the authors show how it relates to other known estimators.

Clarity - Justification: I found this paper generally very easy to follow. A few suggestions on clarity:  lines 108-110: sentence "We conclude in Section 5." added a whole line of whitespace for not much content.  - lines 170-180: perhaps mention briefly how sampling x | beta is accomplished? Normally we do not have access to samplers for all f_k(x).  - lines 571-582: I found this distracting and not necessary.  

Significance - Justification: The estimator has the potential to be used widely, since it's easy to implement and appears to perform well. 

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): - I am a little bit suspicious regarding the quality of this estimator. AIS is generally very hard to beat, because it reduces variance by chaining importance weights. This estimator seems to include the annealing information via the normalization constant of q(beta | x). It may help to expand the bias/variance section to compare it to the variance of AIS or other popular methods more explicitly. For example, I would have found that more valuable than Section 2.3.  - line 448 and line 466: "maximizing the log-likelihood" and "finding the minimum of (18)" is this a mistake?  - I found the experiments generally compelling. I wonder if you could include an additional experiment. The annealing from the uniform distribution to the RBM of Salakhutdinov and Murray is generally poorly behaved. I wonder how this estimator performs in that case.  - How did you get the "true" value for the RBM of Salakhutdinov and Murray? It's intractable to compute, so I would hesitate to call it the "true" value.

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The authors present a novel method to compute partition functions.

Clarity - Justification: The paper is very well written, easy to follow, and enlightening in the review of the literature and the connections with the novel method.

Significance - Justification: The problem is well motivated, the literature is broadly analyzed, and the method seems novel to me. In my opinion, this paper suits very well in ICML, and therefore I recommend its acceptance.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The authors repeat that the method is free in any application using tempered sampling. I think that the estimation of partition functions is usually free in IS-based methods, but this is not the case in MCMC algorithms. I would appreciate further comments about the computational complexity of this method and its comparison with the other methods.  In line 158, \beta_k is treated as a r.v. Maybe, it would be useful to give an explanation about why it makes sense making \beta a r.v. instead of preselected sequence of values.

=====