Poster
Hierarchical Agglomerative Graph Clustering in Nearly-Linear Time
Laxman Dhulipala · David Eisenstat · Jakub Łącki · Vahab Mirrokni · Jessica Shi
Keywords: [ Clustering ] [ Algorithms ]
Abstract:
We study the widely-used hierarchical agglomerative clustering (HAC)
algorithm on edge-weighted graphs. We define an algorithmic framework
for hierarchical agglomerative graph clustering that provides the
first efficient $\tilde{O}(m)$ time exact algorithms for classic
linkage measures, such as complete- and WPGMA-linkage, as well as
other measures. Furthermore, for average-linkage, arguably the most
popular variant of HAC, we provide an algorithm that runs in
$\tilde{O}(n\sqrt{m})$ time. For this variant, this is the first
exact algorithm that runs in subquadratic time, as long as
$m=n^{2-\epsilon}$ for some constant $\epsilon > 0$. We complement
this result with a simple $\epsilon$-close approximation algorithm for
average-linkage in our framework that runs in $\tilde{O}(m)$ time.
As an application of our algorithms, we consider clustering points in
a metric space by first using $k$-NN to generate a graph from the
point set, and then running our algorithms on the resulting weighted
graph. We validate the performance of our algorithms on publicly
available datasets, and show that our approach can speed up clustering
of point datasets by a factor of 20.7--76.5x.
Chat is not available.