Fully Dynamic Coreset Spectral Clustering
Ben Jourdan ⋅ Peter Macgregor ⋅ Gregory Schwartzman
Abstract
We present a fully dynamic data structure that supports edge and node updates and cluster membership queries for the Normalised Cut problem with strong theoretical guarantees. Furthermore, our data structure outperforms the state of the art significantly on real world datasets. At the heart of our data structure is the novel notion of *Just-in-Time Sampling Trees*. The worst-case edge update time of our data structure is $O(\log n)$ where $n$ is the number of nodes in the current graph. Let $d_{\max}$ be the maximum degree of the current graph, let $T_{NC}(n',k')$ be the running time of an $\alpha$-approximation algorithm for the Normalised Cut problem on $n'$ vertices and $k'$ clusters, and let $\text{vol}(Y)$ be the sum of the unweighted degrees of all nodes in a set $Y$. The worst-case query time of our data structure to label all nodes in $Y$ is $O\left(kd_{max}^2 \log(n) + \gamma(n,\epsilon,k,Y)\right)$, with approximation ratio $\alpha \frac{1+\epsilon}{1-\epsilon}$, where $\gamma(n,\epsilon,k,Y)$ is defined to be $\log(n)\log^\star(n)\epsilon^{-4}k^2 + T_{NC}(\epsilon^{-4}k^2,k) + \epsilon^{-8}k^4 +vol(Y)$. Assuming $d_{\max}$ is polylogarithmic, as is the case with many sparse real-world graphs, our method achieves the best known trade-off between query time and update time.
Successful Page Load