Paper ID: 1180 Title: Interacting Particle Markov Chain Monte Carlo Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper introduces a general method to address the mixing problem encountered by some Particle MCMC (PMCMC) samplers. This mixing problem arises typically in the particle Gibbs (PG) in which the simulation of the latent variable vector of states is not carried out through an (independent, or unconditional SMC) as one would naturally do but using a conditional SMC (CSMC), conditioned on the current state of the Markov chain. This specific trajectory (deterministic conditionaly on the current state of the Markov chain) may result in a weight degeneracy especially for the early states of the CSMC. Most of the weights mass being concentrated at this specific trajectory. iPMCMC simulates a Markov chain defined on a product space of M trajectories and some auxililary variables generated by the algorithm. The M trajectories are produced by M nodes that have two possible roles: either the are SMC and produce a sequence of weighted trajectories independently from the state of the Markov chain (through SMC). Either they are CSMC and run the CSMC conditioned on one of the trajectory of the current state of the Markov chain. The number of CSMC nodes is a user defined parameter P (whose optimalilty is discussed). Central to iPMCMC is that nodes can switch role (with a probability that maintains the correct target distribution). This means that at each transition of the Markov chain, one of the trajectory produced by one of the independent SMC node can be set among the trajectories of the new state of the Markov chain, with a non-zero probability. Since this will serve as a futur deterministic trajectory used by a CSMC node at the next transition, this decreases the risk of path degeneracy. iPMCMC will be mostly efficient when the probability of switching role is high. Otherwise iPMCMC can be regarded as a particle Gibbs on an extended state space (with multiple and non interacting trajectories). The authors justify why setting P=M/2 (ie setting as many SMC nodes as CSMC nodes) is likely to be optimal. Parallel implementations are possible since the nodes jobs are independent conditionally on the current state of the Markov chain. Two convincing examples on Gaussian linear State space models and Non linear SSM illustrate the performance of iPMCMC. Clarity - Justification: The paper is a bit dense for this paper format. This is because what motivates iPMCMC is quite subtle. A reader non familiar with the PCMCM litterature might struggle to understand the key novelty conveyed by iPMCMC and why it is appealing. I have a few points: - Section 3 the overhead related to PG is not detailed enough (1st paragraph). Adding a few words about why PG suffers from path degeneracy would help the reader to understand the motivation behing iPMCMC. This is central to the paper, in my opinion.- - How are the P initial trajectories $\mathbf{x}_{1:P}'[0]$ generated, through P SMCs? - notations could be made a bit lighter. For instance, why is $\hat{Z}_{\pi_T,m}$ indexed by $\pi_T$? This can be made implicit without losing generality. - Eq 4 and 5 are central in iPMCMC methodology, this could be worth adding a few explanations. - it should be made clearer whether or not the Markov chain targetted by iPMCMC had P or M trajectories. The theoretical section defines the target with M trajectories. Yet, Algorithm 3 only returns P trajectories. I understand that only those P trajectories are need to the next iteration but for consistency this would be better to either state this or to return the M trajectories in Alg. 3. - the issue about whether or not the choice of the index variable $\mathbf{b}=(N,...,N)$ can be set constant is not clear (at least for me). The paragraph right before thm 1 does not help. Maybe citing a paper which gives more explanations at this stage would help. As it stands it seems that those auxiliary variables comes out of nothing in the theoretical part of the work while it has been ommitted in the methodology exposition. - in the illustrations: it can be observed in Figures 3b 4 and 5 that iPMCMC overperforms other methods for sampling early states of the latent trajectory. Is it related to the fact that non interacting PMCMC path degeneracy issues are emphasized for these states? If you believe this could be an explanation, then make the point. And this would be yet another reason to give more details on this pb (see start of Section 3 and my comment above) - It seems that from Figure 3a iPMCMC benefit kicks in after a burning period (approx. 100 iterations). Is it because, during the transient phase of the Markov chain, the deterministic trajectories used by the CSMC nodes are far from the high density region and allows paradoxically a better mixing that does not penalize non interacting PMCMC (or PMCMC with no unconditional SMC steps)? Significance - Justification: For this format, the authors have provided: - an alternative implementation to usual PMCMC methods that may come for free if parallel environment are available - even if it is not available, adding trajectory produced by unconditional SMC to the Markov chain states certainly improves its mixing, and thus would improve compared to non interacting PMCMC like mPG (multi start PG) - a theoretical justification that the trajectories produced by CSMC nodes are marginally realization of the filter distribution (if the chain has reached stationarity) - an meaningful discussion on the choice of P - convincing illustrations on benchmarks state space models Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): There is certainly more to build theoretically on the iPMCMC methodology. Especially, variance reduction resulting from introducing "refreshed" or "independent" trajectories might be theoretically investigated. The work from Chopin and Singh certainly offers a starting point. Comparing iPMCMC with eg mPG might be possible through Peskun ordering (see Tierney 1995 paper). Indeed it seems that the randomness offered by including trajectories coming ex nihilo from independent SMC will yield a transition kernel for iPMCMC larger than mPG, in the Peskun ordering sense. There is also a connection with the non PMCMC litterature. Consider the pb of sampling a model index and a (static) parameter from mixture distributions. It has been shown that introducing in the Markov chain auxiliary variables (equivalent to the unconditioned SMC nodes in iPMCMC) help the chain to move between models (see eg Carlin Chib 1995 and Douc et al. 2014). Standard Gibbs stuggles to switch between models for the same reason that PG gets stuck when the chain has reached high density regions. The pseudo prior distributions (included in Carlin and Chib method) would be in some sense analogous to the SMC nodes in iPMCMC. Some recent work has been done on this topic, relating introducing independent randomness in the Markov chain to reduce the asymptotic variance of the Monte Carlo estimate (see eg Maire et al 2014). ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper describes an extension of particle Gibbs, where you have an additional move that introduces an (essentially) complete refresh of the conditioned particle path. This is novel and does seem to lead to improvements within a multi-core processing environment. Clarity - Justification: I thought the idea could be presented much more clearly. I found it hard to get the basic idea of the algorithm. As one example, in the middle of the description at the start of Section 3 you interject a comment on computational cost. I would think a description of the algorithm, followed by a subsection/paragraph on computational cost would increase readability. Also some of the maths could be explained in words (e.g. the key idea is the step in equation (4) — could you explain what is happening here?) or maybe give a clearer overview of the algorithm at the start of this section. (iii) I do not understand the comment “This ensures that all the conditional nodes would be distinct.” Minor comment “we will almost surely pick the corresponding particle..” (l. 216). I do not think this is true. It may be true that you will pick a particle whose path coalesces with the conditioned path, but normally it will be distinct at the most recent time-steps. Also, there are approaches to alleviate this coalescing behaviour (ancestral sampling ideas) that should be mentioned here. Significance - Justification: PMCMC is a hot area of research. The idea in this paper is novel, and could have practically important implication as people use particle MCMC, particularly on a multi-core computing environment. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): My main concerns where with presentation (see above). In addition: The abstract seems to suggest that iPMCMC could be better than standard PMCMC more generally. This is not clear from the paper. Would iPMCMC with M lots of SMC/CSMC samplers each with N particles be better than a particle Gibbs with MN particles? I wonder if the section 3.3 could be extended — under the normal approximation can you get a feel for how often you switch depends on sigma? And what the effect of changing M vs changing N is? Can you prove P=M/2 is optimal (maybe for large sigma)? ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors propose interacting PMCMC, which consists of a pool of unconditional SMC and conditional SMC processes, thereby allowing a better exploration vs exploitation tradeoff. It seems to be a fairly straightforward extension of the method introduced by (Huggins and Roy, 2015); specifically, iPMCMC allows arbitrary number of CSMC and SMC algorithms unlike the previous work which uses a single CSMC node. The paper introduces an additional step to sample new conditional workers (according to equation 4) and prove that iPMCMC is a valid MCMC sampler on an extended space. The experiments on fairly simple problems show that iPMCMC outperforms other PMCMC variants and evaluates the effect of the number of CSMC nodes. I found the paper very well-written and I like this extension since it is better suited for large-scale problems. However, I found the experiments underwhelming. Overall, the current version of the paper seems like an incremental contribution. The paper would be much more impressive if it contained an experiment that illustrated the full power of the algorithm. For example, an experiment where a single PG sampler would take an unreasonable amount of time, would convincingly demonstrate that iPMCMC pushes the state-of-the-art for large scale problems. Clarity - Justification: The paper is well-written and was easy to follow (although it does assume familiarity with related work on PMCMC variants). Significance - Justification: The proposed method is a conceptually simple extension of prior work, yet (probably) important in practice. However, without convincing experiments, the current version looks like an incremental contribution. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Conceptually, iPMCMC seems to be a fairly straightforward extension of alpha-SMC based PMCMC proposed by (Huggins and Roy, 2015). I believe that the proposed extension is better suited for large-scale problems. However, the experiments are only on fairly simple problems (for which good solutions already exist). IMO, the paper would be considerably more impactful if it showed that iPMCMC successfully scales to a real-world problem where current solutions fail short. The authors also didn't really address the practical issues of distributing the computation --- it would be interesting to see an asynchronous version of the proposed method. Minor comment: g_1(y_t) needs to be g_1(y_1) in Alg 1 and 2. =====