Paper ID: 1023 Title: Markov-modulated Marked Poisson Processes for Check-in Data Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a variant of Markov-modulated Markov processes for modeling location-based user check-in data. The paper argues that the bursty nature of check-in events makes continuous-time modeling a much more natural approach than discrete-time methods. The proposed model is state-based where states have associated Poisson rates of generating check-in events as well as spatial (Gaussian) distributions over check-in locations. Individual preferences are modeled by allowing for a user-dependent "modulation" of the state transition matrix via a (learned) user-specific vector of length N (the number of states) - similarities (and differences) relative to LDA are discussed. Inference is carried out via MCMC, extending ideas from recent work by Rao and Teh. A number of experiments are carried out using well-known check-in data set, comparing the proposed method to a baseline consisting of LDA. The results indicate that the proposed method systematically outperforms LDA (and a number of simpler "memory-based" baselines) on a number of prediction tasks on this data set. Clarity - Justification: I would like to thank the authors for writing in a nice clear style - it is a pleasure to read a clearly-written paper. Very nice use of Figures, an easy to follow paper. Significance - Justification: (My rating here of "below average" should probably be somewhere between "below" and "above average".) This paper has some nice ideas for modeling check-in data - the models make sense and could be useful in practice. However, I think the paper could do a better job of being clear about the applicability of this technique. The model is ideally suited for users who tend to move around quite a bit between well-defined "popular" locations. One could argue, however, that most users (e.g., who have mobile devices) do not have these characteristics, e.g., instead of looking like the 2 users in Figure 7, most users will spend much of their time in one city (such as NYC or Dallas). There is a hint that this is the case in Section 4, under "Prediction", with the selection of test users who have "sufficient variance", i.e., users who move around a lot. I would like to see more discussion of this aspect of the data and how it affects the experiments in the paper and the potential applicability of the method - it would probably strengthen the paper to be a bit more upfront about this, e.g., perhaps stating that the model's strengths will be in modeling users with relatively high location variance. Another weakness of the paper is the selection of LDA as the main competing technique - it seems like it should be possible to use something a bit stronger used here. A related point is that the survey of related work seems incomplete - there are quite a few other papers that look at trajectory modeling with various forms of Markov models - some (randomly selected) examples are Alvarez-Lozano, Jorge, J. Antonio García-Macías, and Edgar Chávez. "Learning and user adaptation in location forecasting." Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication. ACM, 2013. Gao, H., Tang, J., & Liu, H. (2012, June). Mobile location prediction in spatio-temporal context. In Nokia mobile data challenge workshop (Vol. 41, p. 44). Some of these models may be relatively simple and not applicable to the problems considered in this paper, but the related work should include a better survey of this literature, and in addition, it would be good to use some of these approaches as baselines as well as perhaps using some of the metrics used in these other papers for evaluation. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): - (see earlier comments): an overall discussion of the applicability of the model would be helpful, i.e., the extent to which it will work best when given data on users who tend to move around a lot - in the introduction, if space permits, a figure showing a histogram of the interevent times (between checkins), say on a log-scale, would be informative to readers in terms of setting the general context of the timescale for these types of data sets - Equation (1): what does the notation S_{h_i)} mean? Add some text here to explain this if you can - the modulation of A_ij with B_uj is a nice idea but one wonders if this particular parametrization might have some potential limitations - it would be helpful to the reader to mention any limitations imposed by this parametrization (when its introduced in Section 2) - you say your model "closely relates to LDA" - I would suggest its "loosely" related, in the sense that they both allow for an individual (or document) dependent vector of "state preferences" - but otherwise they seem somewhat different in their formulations - in the first paragraph in Section 2 you mention running FFBS on a "random discretization of time" - it would be helpful to a typical reader to elaborate on this a little here, e.g., explain the associated example in Figure 4 a little more clearly. - if you can make your code publicly-available that would be good to do (I assume the data is publicly available?) ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper proposed marked Markov-modulated Poisson process (mMMPP), an extension of Markov-modulated Poisson process specifically designed for modeling Foursquare check-in data. The latent states correspond to states in US. The number of check-ins are modeled with Poisson process and the check-in locations are modeled with Gaussian. Inference is carried out via MCMC, which is mainly based on (Rao&Teh 2014). Applications (i.e. data-exploration, prediction, and anomaly detection) on Foursquare check-in data demonstrate the effectiveness of the method. Clarity - Justification: The paper is written carefully. Readers can understand the materials easily. Significance - Justification: The novelty for model and inference is relatively limited. As a variation of MMPP model, the inference is mainly based on Algorithm 2 in [Rao&Teh 2014]. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Strength 1. The model is well-designed for user check-in data modeling. For example, it factorizes effective transition rate into global transition rate A and user-preferences B, and combines location information via modeling check-in location in state dependent Gaussian random vector. Those intuitions are realistic and leads to good model interpretability. 2. Several novel applications (i.e. prediction, anomaly detection) are discussed. Experiments are explained in details, making valuable contribution for check-in data modeling and more generally for trajectory modeling. Weakness 1. The novelty for model and inference is relatively limited. As a variation of MMPP model, the inference is mainly based on Algorithm 2 in [Rao&Teh 2014]. The differences of two models come from (1) factorizing effective rate matrix into global rate matrix and user preference vector modeled with Gamma prior. (2) combining location information into trajectory likelihood with a Gaussian assumption. However, the inference section of this paper does not describe how to learn mean and covariance matrix of location distribution. 2. Although the model fits well for check-in data modeling, the relatively narrow scope limits its impact to the community. In addition, experiments only cover one dataset (FourSquare check-in data), and one baseline model (LDA). This is a good application work, applying a well-designed MMPP model to the applications of check-in data analysis including data exploration, trajectory prediction and anomaly detection. The paper can be improved from following aspects: (1) clarify the contribution of model inference other than a direct application of [Rao&Teh 2014]. (2) elaborate the modeling of user (trajectory) dependent information. (3) For real-world data applications, consider adding more baselines such as classification-based trajectory anomaly detection [Piciarelli 2008] Rao, Vinayak, and Yee Whye Teh. "Fast MCMC sampling for Markov jump processes and extensions." The Journal of Machine Learning Research 14.1 (2013): 3295-3320. Piciarelli, Claudio, Christian Micheloni, and Gian Luca Foresti. "Trajectory-based anomalous event detection." Circuits and Systems for Video Technology, IEEE Transactions on 18.11 (2008): 1544-1554. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors study FourSquare check-in data. They propose to use marked Marov modulated Poisson processes to model the data and report good results on a series of prediction tasks (various visualisations, probability of check-in at a location, anomaly detection). The authors build on Rao and Teh (2014) but propose a new prior to facilitate inference and extend the model to account for user-specific trends. Clarity - Justification: The paper is very well written and well motivated, providing all the necessary intuition about the problems that are considered and reporting extensive experimental results. As this is mainly an application paper, I would have liked to see a more detailed description of the data. The technical details provided and the experimental setup are not described into sufficient detail, which means that it would be difficult to reproduce the analysis and the results. Still, I think the extensive set of experiments make it a worthwhile paper. Significance - Justification: The study is well executed and the authors extend previous work (in particular the work of Rao and Teh) by considering a prior that is more suitable to the problem at hand and which makes inference relatively easy. Also, the proposed model is able to capture individual trends by allowing for user-specific latent variables. This was proven beneficial in several predictions tasks described in the experiments section. While section 3 provides an outline of the inference procedure, most of the mathematical details are left to the reader and, unless I am mistaken, are not included in the supplementary material. Along similar lines, the authors do not define MJPs, Poisson Processes and the like, and assume instead that the reader is familiar with these concepts and author's notations/parametrisations. The experiments are extensive. The results are convincing and discussed into much detail. The anomaly detection experiment is somewhat artificial, but I was happy to see the final discussion comparing the proposed approach to its discrete time counterpart. Overall, I found the discussion and analysis very interesting and compelling. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): - What is meant by the “effective” rate matrix A in section 3? - The authors seem to be unaware of the paper by Opper and Sanguinetti (2008) — Variational inference for Markov jump processes, which might provide a starting point for an alternative inference scheme for the problems studied in this paper. =====