Paper ID: 981
Title: Pliable Rejection Sampling

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper considers the problem of drawing i.i.d. samples from an unnormalized density function f. Specifically, the authors consider the fixed budget setting where an algorithm can query $n$ samples and tries to maximize the number of i.i.d. samples drawn from f in the end. The main result is that the authors propose a new rejection sampling method based on kernel density estimator and theoretically guarantee how many i.i.d. samples the algorithm obtains (lower bound). The empirical results show that the proposed method PRS work better than the standard rejection sampling and better than or comparable to A^* in low dimensional settings.

Clarity - Justification: The paper is easy to follow and the motivation is clear.

Significance - Justification: The main result is very interesting. The theoretical guarantee on the acceptance rate is very attractive and of significant importance. The algorithm seems to require less information than competitor too, though I have no idea how to set the parameter H_C.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I do not understand how one can choose H_C. It seems to me that not setting H_C correctly would break the guarantee, right? In the experiment, the authors state that they use cross validation to choose H_C, but I don’t know how to do CV in this setting — how do you split folds and from what? How is it different from simply trying it a few times with a small set of samples to get a sense on H_C?  093: was build => was built 569: 1+r_n => is it r_N instead?

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper developed a kernel estimator to design the sampling envelope for rejection sampling. The advantage of the new sampling strategy is that the samples obtained are i.i.d and distributed according to f. 

Clarity - Justification: The paper reads well. 

Significance - Justification: This paper has very nice theory and an easy-to-implement algorithm. 

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The theoretical proof (esp Theorem 2) is very nice. Another advantage is that although the theory part is long, the main algorithm (Algorithm 1) is very straightforward.   The experimental results are not very strong, however. The dataset chosen by this paper is not very difficult. Moreover, there are not enough comparisons with the previous works. It is not very clear to me how the proposed method can be compared with the-state-of-the-art.  The most attractive part seems to me lies in the high dimensional cases (in section 3.6). However, I am not convinced how powerful is for high dimensional algorithm due to the limited experimental supports.  Since sampling of high dimensional data is very important in practice, I wish the authors could show more results on this aspect. 

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a rejection sampler method. The cornerstone of the algorithm is Theorem 1 that provides an explicit bound of the sup-norm between the target density $f$ and a user-constructed enveloping function. This enveloping function involves a number, $N$, of evaluations of the target density $f$ on a set of uniformly sampled points from the support (assumed bounded) of $f$. Due to the randomness in the construction of the enveloping function, the bound is correct with a high probability, controllable by the user.  The paper proposes approaches for treating unbounded spaces (in the Appendix), and high dimension $d$. It shows two applied examples. 

Clarity - Justification: The paper describes a simple algorithm, and is well-written.

Significance - Justification: I am not sure I am convinced about the scope of the method. As the authors recognise, the approach is mostly useful in low dimensions. The Section in the paper (Section 3.6) suggests using an optimisation procedure to locate the part of the state space with non-negligible probability, in a number of steps which is polynomial in $d$. I still do not understand how can then these points deal with the curse of dimension when plugged in into the algorithm; some more details would be useful at this point.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): See my answers above.

=====