Skip to yearly menu bar Skip to main content


Afternoon Poster
in
Workshop: Artificial Intelligence & Human Computer Interaction

Informed Novelty Detection in Sequential Data by Per-Cluster Modeling

Linara Adilova · Siming Chen · Michael Kamp


Abstract:

Novelty detection in discrete sequences is a challenging task, since deviations from the process generating the normal data are often small or intentionally hidden. In many applications data is generated by several distinct processes so that models trained on all the data tend to over-generalize and novelties remain undetected. We propose to approach this challenge through decomposition: by clustering the data we break down the problem, obtaining simpler modeling tasks in each cluster which can be modeled more accurately. However, this comes at a cost, since the amount of training data per cluster is reduced. This is a particular problem for discrete sequences where state-of-the-art models are data-hungry. The success of this approach thus depends on the quality of the clustering, i.e., whether the individual learning problems are sufficiently simpler than the joint problem. In this paper we adapt a state-of-the-art visual analytics tool for discrete sequence clustering to obtain informed clusters from domain experts, since clustering discrete sequences automatically is a challenging and domain-specific task. We use LSTMs to further model each of the clusters. Our empirical evaluation indicates that this informed clustering outperforms automatic ones and that our approach outperforms standard novelty detection methods for discrete sequences in three real-world application scenarios.

Chat is not available.