Skip to yearly menu bar Skip to main content


Poster

MC-GTA: Metric-Constrained Model-Based Clustering using Goodness-of-fit Tests with Autocorrelations

Zhangyu Wang · Gengchen Mai · Krzysztof Janowicz · Ni Lao


Abstract:

A wide range of (multivariate) temporal (1D) and spatial (2D) data analysis tasks, such as grouping vehicle sensor trajectories, can be formulated as clustering given metric constraints. Existing metric-constrained clustering algorithms overlook the rich correlation between feature similarity and metric distance, i.e., metric autocorrelation. The model-based variations of these clustering algorithms (e.g., TICC and STICC), which achieved SOTA performance, further suffer from computational instability and complexity by using a metric-constrained Expectation-Maximization (EM) procedure. In order to address these two problems, we propose a novel clustering algorithm, MC-GTA (Model-based Clustering via Goodness-of-fit Tests withAutocorrelations). Its objective is only composed of pairwise weighted sums of feature similarity terms (square Wasserstein-2 distance) and metric autocorrelation terms (a novel multivariate generalization of classic semivariogram). We show that MC-GTA is effectively minimizing the total hinge loss for intra-cluster observation pairs not passing goodness-of-fit tests, i.e., statistically not originating from the same distribution. Experiments on 1D/2D synthetic and 7 real-world datasetsdemonstrate that MC-GTA successfully incorporates metric autocorrelation. It outperforms strong baselines (TICC/STICC) by large margins(up to 14.3% in ARI and 32.1% in NMI) with faster and stabler optimization (>10x speedup).

Live content is unavailable. Log in and register to view live content