Paper ID: 521
Title: Markov Latent Feature Models

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper proposed a new approach to latent feature models, called markov latent feature models. THis model models the correlation of the latent features, using the transition matrix.  

Clarity - Justification: This paper is easy to follow.

Significance - Justification: A new idea to introducing correlation modeling into latent feature models. This appear to be new.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This paper introduces a new method for modeling correlations among latent features. This method appears to be new and useful. The varitaional inference itself seems to be pretty standard though. The technique seems to be quite interesting in identifying correlated latent features as demonstrated by the experiments.   I am a little puzzeled by how the first return time defined in eq (4) is realized in the varitaional inference as we don't really know that in advance. Can you explain that in more details?

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper presents a Markovian latent feature model to generate the features of a set of objects. The construction is designed such that the model is exchangeable, hence a simple inference algorithm. The paper presents two flavours of the model: parametric and non-parametric. It then applies the model on to tasks: genome analysis and image denoising. 

Clarity - Justification: The paper is written reasonably clear.

Significance - Justification: The paper presents an alternative construction to Indian Buffet Process which enables capturing first order correlation between features.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper provides an interesting models and inference algorithms. 

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes an interesting approach to perform dictionary learning and sparse coding, which realizes the selection of a sparse subset of dictionary atoms for each data point using a sequence of state transitions that both starts and ends at the null state. The correlation between the latent features are captured by the inferred state transition probability matrix. 

Clarity - Justification: The algorithm is clearly presented. 

Significance - Justification: That related idea has been used in Zhang & Paisley, 2015 for mixed-membership modeling. While mixed-membership modeling and latent feature learning are clearly different, the basic idea of modeling the correlations between latent clusters/features using a latent state-transition probability matrix is the same.  

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.):  The actual implementation of the algorithm appears to be similar to the K-SVD.   The K-SVD uses a greedy algorithm such as orthogonal matching pursuits (OMP) to select a subset of dictionary atoms to represent a data point under the learned dictionary, whose stopping criteria is decided by either the predetermined noise variance or the sparsity level. In OMP, which dictionary atom to select does not depends on which dictionary atom was selected in the previous step. The proposed algorithm improves the K-SVD by modeling the correlations between the selections of dictionary atoms between two consecutive greedy search steps.   Given the similarity between the proposed MLFM and the K-SVD (both using greedy search in sparse coding), it would be helpful to also display the results of the K-SVD in Figure 2 and Figure 3. I suspect that the K-SVD may perform similar to MLFM in both cases.   Figure 4 looks interesting, but I suspect if one records the sequence of dictionary atoms selected by the K-SVD for each image patch and treats each dictionary atom as a state, then one also obtains a state-state transition count matrix, which can be used to produce a figure similar to Figure 4 for the MLFM.   An important advantage of BPFA, a Bayesian model, over the K-SVD is that BPFA can automatically infer the noise variance from the data. As a Bayesian model, however, the MLFM seems to require pre-setting the noise variance, which appears somewhat unsatisfactory.   The results for the K-SVD and BPFA in Table 1 are different from those reported in Aharon et al. 2006 and Zhou et al., 2009. It is necessary to clearly explain about why there are differences. 

=====