We address the problem of detecting distribution changes in multivariate data streams by means of histograms. Histograms are very general and flexible models, which have been relatively ignored in the change-detection literature as they often require a number of bins that grows unfeasibly with the data dimension. We present \QuantTree, a recursive binary splitting scheme that adaptively defines the histogram bins to ease the detection of any distribution change. Our design scheme implies that i) we can easily control the overall number of bins and ii) the bin probabilities do not depend on the distribution of stationary data. This latter is a very relevant aspect in change detection, since thresholds of tests statistics based on these histograms (e.g., the Pearson statistic or the total variation) can be numerically computed from univariate and synthetically generated data, yet guaranteeing a controlled false positive rate. Our experiments show that the proposed histograms are very effective in detecting changes in high dimensional data streams, and that the resulting thresholds can effectively control the false positive rate, even when the number of training samples is relatively small.
Giacomo Boracchi (Politecnico di Milano)
Diego Carrera (Politecnico di Milano)
Cristiano Cervellera (National Research Council)
Related Events (a corresponding poster, oral, or spotlight)
2018 Oral: QuantTree: Histograms for Change Detection in Multivariate Data Streams »
Fri Jul 13th 09:30 -- 09:40 AM Room A6