Keywords: [ Clustering ] [ Combinatorial Optimization ] [ Non-convex Optimization ] [ Optimization - Non-convex ]

Abstract:
Learning community structures in graphs that are randomly generated by stochastic block models (SBMs) has received much attention lately. In this paper, we focus on the problem of exactly recovering the communities in a binary symmetric SBM, where a graph of $n$ vertices is partitioned into two equal-sized communities and the vertices are connected with probability $p = \alpha\log(n)/n$ within communities and $q = \beta\log(n)/n$ across communities for some $\alpha>\beta>0$. We propose a two-stage iterative algorithm for solving this problem, which employs the power method with a random starting point in the first-stage and turns to a generalized power method that can identify the communities in a finite number of iterations in the second-stage. It is shown that for any fixed $\alpha$ and $\beta$ such that $\sqrt{\alpha} - \sqrt{\beta} > \sqrt{2}$, which is known to be the information-theoretical limit for exact recovery, the proposed algorithm exactly identifies the underlying communities in $\tilde{O}(n)$ running time with probability tending to one as $n\rightarrow\infty$. As far as we know, this is the first algorithm with nearly-linear running time that achieves exact recovery at the information-theoretical limit. We also present numerical results of the proposed algorithm to support and complement our theoretical development.