Paper ID: 174 Title: Hierarchical Span-Based Conditional Random Fields for Labeling and Segmenting Events in Wearable Sensor Data Streams Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors present a three-tiered CRF for time-series segmentation of health sensor data. A dynamic programming algorithm allows for exact MAP inference in time quadratic in the sequence length, and max-margin learning is used to learn model parameters. Clarity - Justification: The paper is generally clear and well written. Some the factor definitions become a little obtuse, but I'm not entirely how that could be improved. It would have been nice to explicitly state that mathcal(Y)={0,1} at some point. Significance - Justification: Going beyond pairwise CRFs is obviously valuable for improving segmentations, and this work represents a principled, reasonable approach. Quadratic run time isn't great however. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The method itself is pretty clean and interpretable. The results don't always go in favour of the proposed methods vs. the tree CRF, but itself is a powerful method. In the presentation of the results, I appreciated the detailed description of the cross-validation strategy. I don't like choosing the most signficant p-value threshold for each test: either pick one level (0.05 or whatever) and always report significance based on that, or just report the p-value. It's annoying not having the different methods on the same plot for fig 3: why not split by sigma^S so comparisons are easier? Minor edit: "First, every label variable Y (1) 396 i must be covered by exactly one span variable Y (l) at each 397 jk level l > 1.", should be "one non-null span variable" ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper describes a model for labeling and segmenting activities that consist of intermittent events, such as eating and smoking. The proposed CRF performs labeling and segmentation jointly, and includes features at the level of individual events, spans of time in between events, and counts of the number of positive events in an entire activity. This model is carefully structured so that MAP and marginal inference remain tractable, even though the features are fairly rich. Empirically, the method performs well on synthetic and real datasets, compared to good baselines. Clarity - Justification: For the most part, the writing was very clear. The algorithm and experiments are described in enough detail to make it possible to replicate the results. There are a few places where the description is a bit hard to follow, though. Perhaps a table listing all the factors in each level, or some other kind of visualization. Figure 1 helps convey the basic idea, but it's still very easy to lose track of all the different factors and what they mean. One source of confusion is the different terminology for points and ranges of points. The individual points are referred to as both "events" and "cycles." "Cycle" makes sense for this application, since the events are often individual respiratory cycles; however, the term often confused me because "cycle" implies a sequence, and in this model each cycle is treated as an atomic event. I recommend consistently using "event" to refer to the finest granularity unless there is a good reason to do otherwise. There is also some confusion about the different levels. The model itself has three levels: level 1 contains the base level variables denoting positive and negative events, level 2 groups together negative events into "inter-event" intervals, and level 3 groups together positive events and the intervals between them into sessions. However, some of the text instead refers to two levels, omitting the base level. As an example, "We also constrain the labels on the bottom level to be negative if the labels in the first level are negative..." Intuitively, shouldn't the bottom level be the same as the first level? This needs to be clear and consistent. Minor: "map" -> "MAP" (Sec. 3.4) Thank you for making the results figures readable. Significance - Justification: This is a very nice application paper: the descriptions are detailed and clear, the experimental method is sound, the model itself is novel and tractable, and the model is a good fit to the application. In addition to working well on these particular applications, the methods may inspire models for related activity recognition problems. The methods used here are not revolutionary -- they build on standard CRF and structural SVM approaches, such as hierarchical CRFs and probabilistic CFGs. The proposed models do not immediately apply to a wide range of applications. Nonetheless, I think the significance is still "Above Average" because the idea is executed and presented well and could inform a number of related current and future applications. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): See above. The model description is a bit confusing, but otherwise the paper is quite clear. The methods are sound, somewhat novel, and a good fit to real-world applications. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors present a new method for jointly detecting individual events and grouping these events into periods with high rate of events. This problem is motivated by mHealth where people wear sensors continuously to determine things like like how many snacks did you eat in the last week and how long did you spend snacking. The authors work is based on a hierarchical conditional random field that takes into account spans. Their approach admits a quadratic time inference algorithm via dynamic programming. The authors show competitive results to logistic regression for event detection and a span free hierarchical CRF. Clarity - Justification: The paper is very well written and easy to follow. Significance - Justification: The significance stems from the importance of the problem in making sensors in mHealth practical. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): 1) Whats the point of the nonlinear transformation of the features? Is there some property of the sensor that should be modeled? 2) Do the significance results correct for the multiple tests ran or are they intended to simple point out which results are worthwhile to pay attention to? 3) It would be nice to replicate Thomaz et al. random forest result for event detection. It seems as simple as the LR implementation and might make it more clear what one should use in practice. 4) What features are important? ===== Review #4 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper proposes a method to segment time series allowing an intermediate layer of event labels that takes into account the intervals between subsequent occurrences of the events. This is of practical interest as hierarchical structures of the events represented in real-world time series are not uncommon. It is well written. The particular advantages of the proposed method are not incredibly clear. It appears to be more useful at segmenting activity than at detecting events, when compared to studied alternatives. Clarity - Justification: This is a clearly written and well structured paper. Significance - Justification: Comparing to a strong alternative, the T-CRF algorithm, the proposed method does not seem to have a clear advantage in terms of event detection performance. It also does not have a clear edge versus much simpler method based on logistic regression. On a perhaps more relevant to the anticipated utility of the proposed method task of activity segmentation, the proposed method wins significantly on two of the four studied benchmark data sets. It is interesting that this victory is more pronounced on the data that is apparently more difficult to predict. This hints on the potential scope of utility of the proposed methods: it may be complementary to the existing methods when those methods do not do very well, but redundant in cases when they do well. This obviously requires a further study. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This paper is structured nicely and it is well written. It would benefit from more comprehensive comparisons against established methods of event detection (all sorts of control charts come to mind) and against various methods of segmentation, including time series clustering. Focus on unique strengths of your method when compared to those alternatives to identify when exactly it may be of help. =====