Paper ID: 1047 Title: Efficient Multi-Instance Learning for Activity Recognition from Time Series Data Using an Auto-Regressive Hidden Markov Model Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a probabilistic model (and associated training algorithm) for activity recognition in time series data. The idea is to use multi-instance learning to reduce the burden of labeling training examples, when training an auto-regressive hidden Markov model. The authors build a multi-instance likelihood into their latent variable model, and show how to reformulate the model to perform efficient training with a forward-backward EM scheme. Clarity - Justification: The paper is in general well written, well argued, and easy to follow. Significance - Justification: The model is a principled approach combining two previously disparate sets of work, HMMs and multi-instance learning, motivated by a valuable task, activity recognition. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This is a very nice piece of work. Probabilistic modeling approaches to multi-instance learning are surprisingly not well explored, given the extent to which multi-instance learning has been studied in the literature, and this is a useful step in that direction. The proposed model and training algorithm are a solid contribution. The multi-instance likelihood model, which captures a probabilistic relaxation of standard multi-instance modeling assumptions, is in itself an interesting contribution to the MIL community. The motivating application and efficient E-step procedure add to the value of the work, and the experiments are convincing. The main limitation of the proposed approach arises from the use of multi-instance modeling assumptions, which typically require that the classification label be binary. This is not directly appropriate in the activity recognition setting, which is inherently a multi-class problem. The workaround is to run the model with one-against-the-rest labels for each label of interest, and this seems sufficient for the time being, as shown by the experiments. It should be noted that this limitation is a property of the majority of the work in multi-instance learning, the primary exception being: Zhou, Z. H., Zhang, M. L., Huang, S. J., & Li, Y. F. (2012). Multi-instance multi-label learning. Artificial Intelligence, 176(1), 2291-2320. Extensions along the lines of that paper could be an interesting direction for future work. Other small suggestions for the authors: -Pg 1., the direction of quotation marks for "free-living". Use `` and '' to get the direction of quotation marks correct in latex. -Perhaps too much of the paper is dedicated to the discussion of the message passing algorithm for the E-step. After the model reformulation proposed by the authors, the forward-backward scheme is fairly standard, and may not warrant this much discussion. -The model uses uniform priors for some parameters to simplify inference. Conjugate priors for most of those parameters do exist and can be used to give a little more flexibility, as well as facilitating fully Bayesian extensions, e.g. using an inverse Wishart prior for the Gaussian covariance matrix. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): Efficient Multi-Instance Learning for Activity Recognition from Time Series Data Using an Auto-Regressive Hidden Markov Model -------------------------------------------------------------------- This paper tries to solve activity recognition from time series data with a multiple instance learning model. Unlike supervised learning applied to this task, the multiple instance learning model would alleviate the burden of labelling. Specifically, this paper propose an Auto-Regressive Hidden Markov Model, derives an efficient parameter learning strategy and presents some experimental results. Clarity - Justification: Basic idea is clear but very dense in places. No numerical examples. Significance - Justification: An interesting promising new approach to modeling time series with MIL. However, has experimental flaws/weaknesses. Results mildly better than some baselines. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper is addressing an important problem. The methodology is novel and the derivation of the approach is interesting. The paper is mostly well written, though parts of it are dense. The main question I had was the motivation of allowing Z to change with t instead of holding it fixed. Second, given that Z changes over t, why is there no dependence on the other Z's for X^t? Shouldn't (4) be X^t|X^(t-p):(t-1), Z^(t-p):(t-1) or similar? (3) and (4) seem to be at cross purposes. The experiments have some drawbacks. Although generally well presented and interesting, the parameter tuning for the baselines is poorly done. In our work, we have found e.g. ranges in the order of 10^-3 to 10^3 needed for good results with SVMs. Another weakness is that the authors use 20 random splits to present their results. This makes the standard deviation figures biased and untrustworthy. It is not clear to me why in 2016 this flawed experimental methodology is still being used. In the experiment setup, authors mentioned they are following the setup from (Stikic 2011), which constructed the feature vector by using both statistical features and FFT features. However, in experiments, they run base algorithms with only one of the two kinds of features, not combining them like the reference paper did. This likely deteriorates the performance of base algorithms. The reference paper (Stikic 2011) also proposed MIL-based algorithms for activity recognitions. But no experimental results of these algorithms are given in this paper for comparison. The authors may consider this related paper useful: M. Popescu and A. Mahnot. Early illness recognition using in-home monitoring sensors and multiple instance learning. Methods of Information in Medicine, 51(4):359–367, 2012. To conclude, this is a good paper with an interesting approach. But it also has weaknesses that, if corrected, would make it easier to argue for publication. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper presents an autoregressive hidden Markov model for physical activity recognition that incorporates ideas from multiple instance learning. The proposed model, ARHMM-MIR, was demonstrated to outperform previous multiple instance learning methods on two data sets. Clarity - Justification: A lot of terminology is not clearly defined in the context (e.g. bag) and results are presented as a massive table that is not easy to see what is going on. See detailed comments. Significance - Justification: This paper cleverly uses some sophisticated ideas to an interesting domain problem. See detailed comments. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This paper presented an interesting model to perform activity recognition from time series collected from sensor data. The main advantage of the model is that it alleviates the need to manually label a large amounts of sensor data. The authors achieve this by extending ideas from multiple instance learning to directly model the time series, rather than feature representations that ignore the temporal dependence. The model and algorithm seem to be straight-forward applications on existing ideas, but overall I think that the application of the ideas to the application domain is interesting enough. First, there are some explicit points that should be addressed: - How did the authors decide on and AR order of 2? This should be described in the paper. - Why is the message passing algorithm O(K^2T^2) rather than O(K^2T)? It doesn't look like the messages require summations over T. If they do, then this needs to be made clearer. - The dynamic programming algorithm seems to just be belief propagation applied to the proposed model. I appreciate the very detailed derivation of the algorithm, but what is there a unique problem in this setting that needed solving? Additionally, the authors should look at some of the work on switching linear dynamical systems by David Barber as he has developed similar algorithms for these types of models. The rest of my comments deal with the presentation of the ideas in the paper. I found the paper somewhat difficult to read due to a lot of notation and terminology not being clearly defined. For example, a bag is never precisely defined in the context of the paper. I thought that maybe the experiments would provide some anecdotal evidence for what they are, but they didn't. The authors should very precisely indicate how they got from raw time series data to the bags that they actually model. A figure like Fig. 1 with the bags overlaid could also be very helpful. Also, what is I in the equation around lines 381? An important aspect of the proposed model that really needs to be discussed more is that the model only predicts one activity. This means that you need one of these models per activity that you would like to model. The authors should indicate why this is reasonable to do and how multiple activities would be handled in practice. The table in the experiments section is very hard to read and make sense of. The authors should really consider replacing it with a visual (just a bar chart would be easier to see what's going on) and moving the table to the supplement. Finally, I have some questions regarding practical aspects of the model that I think if addressed in the paper could increase it's usefulness in the field: - How many bags did you use in practice for the various data sets? If there are a lot of them have you considered using stochastic optimization rather than coordinate-ascent? - How long are the bags? Is there domain knowledge that can help you choose this? I'd like to reiterate that I think that this paper is an interesting application of sophisticated machine learning methods to an interesting domain problem and with some small things that I brought up addressed I think it could fit in well at ICML. =====