Paper ID: 1040
Title: Early and Reliable Event Detection Using Proximity Space Representation

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The authors propose a novel framework for early, reliable detection of events in time series. The method is based on similarity measures that embed sequences of frames, and comes with Rademacher-based generalization guarantees.

Clarity - Justification: The paper is well written and the mathematical details can be followed.  

Significance - Justification: Early, and reliable event detection seems like something that could have a lot of practical use. The algorithm is claimed to acheive similar accuracy while being faster than state-of-the-art.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The event detection algorithm is well described and seems reasonable. However, the empirical results could be stronger. While they do indicate that the model has some interesting properties, many details are missing, making evaluation difficult. While detection accuracy is not necessarily better, and often slightly worse than the competitor (MMED), the algorithm is more reliable, by the author’s definition This seems like a positive. Additionally, the proposed algorithm seems to be (sometimes?) faster than the competitors. Timing data is vague: in the first experiment on the toy data, the proposed method is claimed to be 100 times faster. In the BCI dataset no timing information is given. In the emotions dataset the claim is that the proposed algorithm runs an order of magnitude faster. Can the authors please make this data more precise? In the BCI data, why are the parameters the same as in the Toy data? How are parameters selected for the emotions dataset? 

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The authors present a method for early event detection that is reliable in the sense that the decision rule is self consistent with decisions made with more information. Their approach is based on increasing similarity functions parameterized by “landmarks” against which all inputs are compared. This combined with positive weights gives a reliability. To bias towards early detection, the authors introduce a preset mu parameter which prefers earlier information. The use of mu rather than unrolling provides a large computational benefit. The results are comparable to their baseline.  

Clarity - Justification: The paper was very straightforward to read.

Significance - Justification: Early detection of events is a central problem in online analysis.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Some Comments:  -The choice of mu seems like there’s is some sort of external utility that is preferred over modeling objective. Can that be rolled into the objective without training on truncated sequences?  - Some sort of increasing recurrent structure could capture all desired properties   - Why not directly optimize the metric in Figure 2? That is the normalized-time to detect for small false positive rate?   Minor Comments:  -Missing Reference Line 245  -Line 814 “ratein” -> rate in

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The authors propose a max-margin based approach to detect temporal events from time series given only the first part of observations. The decision function is constructed from a non-decreasing similarity measure k(,), which computes the similarity between a partial time series and a pre-defined landmark p. The proposed approach is designed and evaluated by two criteria: earliness and reliability. Compared to previous work (MMED), the authors claim their method to be more efficient.

Clarity - Justification: The writing in general is OK but some key technical information seems blurred. 1. The claimed property of 'reliability' is confusing and misleading. Given the definition in 3.1, 'reliability' actually means 'no false alarm', i.e. it's saying 'precision=100%' but not 'recall'. The authors' claim (line 104) that their framework "ensure that the decision with a partial observation is identical to the one achieved with the full sequence" seems to imply that both precision and recall to be 100% given partial observations. I cannot see how 'reliability' guarantees that.  2. The experiment setup in section 6.1 (toy dataset) is not clear enough to me. The authors designed the toy dataset with two classes and 'each class is a one-second linear chirp'. So what's the difference between the two classes? Could you plot some of the time series, similar to Fig 6 in (Hoai et. al. 2014), to help readers understand your toy dataset? 

Significance - Justification: I would consider the contribution of this paper as incremental.  1. The claimed properties of 'earliness' and 'reliability' are not impressive in the experiment results. Compared to the earlier work (MMED) ,the 'earliness' measurement (AUAMOC) is on par, slightly worse, slightly better on three datasets respectively, and the 'reliability' measurement (AUC) is on par, slightly worse, slightly worse respectively. The only advantage of the proposed algorithms seem to be the running time.  2. The idea to explore 'reliability' is pretty novel for early event detection, but as I mentioned above, I could not agree on its definition in this paper. It might be more interesting to define 'reliability' as 'AUROC' directly and maximize AUROC given a fixed earliness measurement.  

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): 1. [sec 3.3] It would be helpful to further discuss how to use the weighted l-1 norm on w to promote earliness. I feel confused about the dimension of w: is it m (line 385) or T (line 405), or actually m=T? From my understanding, each landmark is a pre-defined local feature and the authors want to de-emphasize the late-appearing landmarks by enforcing the correspondent weight w to be small. It makes me feel that 'earliness' is heavily dependent on the proper choice of landmark p's, i.e. if given the correct feature (landmark) that matches the beginning part of those events, 'early detection' is more likely to be achieved. I wish the authors illustrate how they choose those landmarks in the experiment section. 2. [sec 3.4] Prop 3.2 might need more clarification. Is it trivial to apply (Kakade et al. 2009) when there is a feature map \psi in (2)? Especially if k(,) is a norm but not an inner product? 3. A more careful proofread is needed to avoid: 'section ??' in line225, 364, inconsistent notations: 'AUROC','AUAROC','AUC' in line 635,648,762, etc.

=====

Review #4
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a method for timely detecting events in time series.

Clarity - Justification: The paper is easy to read and follow. There are a few typos: - There is a problem with a cross reference on line 225 - There seems to be some (maybe a mistake on the definition of a Latex command) glitch with the places where “MIL” appears that always seem to lack a space after to separate it from the next word.

Significance - Justification: The paper addresses a very relevant class of applications (early detection of events on time-dependent data) that is often missing from ML research. The authors propose a formulation of these tasks as a multi instance learning problem. This formulation seems sound and interesting.  The experimental evaluation of the proposal shows interesting results.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This is an interesting proposal for addressing a set of tasks that is not well covered by the ML literature. In this context, I think this paper does a good job in raising awareness for these relevant problems.  The proposal that is present is interesting and seems sound, providing reasonable convincing results on the experiments that were carried out.  There is a very related work that is not referred by the authors which is the work by Fawcett and Provost (1999): Activity monitoring: noticing interesting changes in behavior, in Proc. of KDD'99.  

=====