Terminal Dimension Reduction for Time Series with Applications
Alexander Munteanu ⋅ Matteo Russo ⋅ David Saulpic ⋅ Chris Schwiegelshohn
Abstract
Terminal embeddings have emerged as a powerful tool for dimension reduction. Given a set of points $P\subset \mathbb{R}^d$, a terminal embedding is a mapping $f:\mathbb{R}^d\rightarrow \mathbb{R}^t$ that preserves the pairwise distance between any pair of points $p\in P$ and $q\in \mathbb{R}^d$ up to small distortion under this mapping. Terminal embeddings have been particularly fruitful for constructing $k$-means and $k$-median coresets, where the objective is to find a typically weighted subset $\Omega$ of $P$ such that for any candidate solution, the cost of the clustering objective on $\Omega$ approximates the cost of the clustering objective on $P$ up to small distortion. Unfortunately, these techniques have not been extended to more complicated structures such as clustering time-series data under common straight-line interpolation between measurements. The main issue is that terminal embeddings, arguably the central technique in this line of research, cannot be linear and are thus not immediately suitable to preserve linear structures. In this work, we develop a generalization of terminal embeddings to affine line-segments that overcomes this issue. We showcase their applicability by using our lines-preserving terminal embeddings to obtain the first dimension-free coresets for clustering time-series under the Fréchet distance. The underlying dimension reduction uses Johnson-Lindenstrauss embeddings, and our experiments indicate that they compare favorably against PCA for real-world time-series.
Successful Page Load