We thank the reviewers for their detailed comments on our paper. We are also thankful for the positive comments on the model and presentation. Below we address some general and specific concerns expressed in the reviews.$ Inference algorithm (reviewer_1, reviewer_5). Because of limited space, we left out the details of some standard steps of the inference algorithm, including the procedure to sample the parameters of the location distributions. To address this concern, we will add details in the supplementary material; we will also make our code available. While our inference algorithm for the MJP paths follows the uniformization algorithm of (Rao & Teh), we chose to emphasize an alternative global-local Gamma prior over the elements of the MJP rate matrix. Our construction allows a shared global "transition bias", along with user-specific preference vectors, that remains conjugate to the MJP likelihood. Scope of applications of model (reviewer_1). Our starting point in this work was the check-in data, and we think this is an important type of data in the mobile era. It exists not only in explicit check-in applications (e.g., FourSquare), but also implicitly in other activities such as posting (geo-tagged) photos in social networks. We decided to emphasize our particular application to have a consistent theme throughout, and to better allow the reader to develop a feel for the consequences of our modeling assumptions. More generally, our model is applicable to other types of data that consists of time-stamped sequences of observations (locations in our case). Check-in data may also contain information other than time and locations, such as location category and the friendship between users, presenting new challenges and opportunities in modeling check-in data. Baselines (reviewer_1, reviewer_3). We compare with LDA since the local-global aspect of our model can be viewed as a continuous-time extension of LDA. We were not aware of another statistical model of check-in data. One of the state of the art methods is a mixture model (Lichman, Smyth), and we think LDA with Gaussian topics is a richer model. Dataset (reviewer_1). We chose to focus on one dataset to highlight in detail the pipeline of data exploration, visualization, prediction, and anomaly detection. We also performed experiments at different scales (USA scale and Florida scale), which gave different insights. One can also study other US states or even cities using the same dataset. Location variance (reviewer_3). We agree with reviewer_3 that our model is more appropriate for modeling users with high location variance. As mentioned above, a user always staying in one city has low variance in the USA scale, but may be interesting to model at the city scale (e.g., fit a model on data restricted to a city). Related work (reviewer_3). There are a lot of work on trajectory mining and data management in recent years, but very little probabiilistic modeling. It is not possible to survey all of them. But we agree with reviewer_3 that those on trajectory modeling using Markov models should be included, we will do this. Our continuous-time approach allows a more natural representation of datasets with activity at multiple time-scales. Specific response to reviewer_3: - S(h_i) is the MJP state of the user at time h_i - we will make the code publicly available - thanks for the other editing suggestions, which we will incorporate as appropriate Specific response to reviewer_5: - we will give a more detailed description of the data, and the experiment setup - effective rate matrix \tilde{A} of a user is constructed from the global rate matrix A and user preference vector B_u, namely \tilde{A}_{ij} = A_{ij} B_{uj} - thanks for pointing out the (Opper & Sanguinetti) paper, we were aware of it, and have now included a reference to it. Moshe Lichman , Padhraic Smyth, Modeling human location data with mixtures of kernel densities, KDD 2014