On the Optimization Trajectory of DeepWalk Embeddings
Abstract
The DeepWalk algorithm has been widely used for learning node embeddings in graphs. Combined with the idea of negative sampling, the DeepWalk algorithm has been shown to be implementable at scale, easily handling graphs with millions of nodes. However, theoretical guarantees on the resulting embeddings are much less understood. Recent results have studied the minimizers of the objective and have shown interesting guarantees for certain graph classes. However, the optimization trajectory, i.e., what happens when we start at a random initialization and run gradient descent, remains poorly understood. This is especially true for the implementation of DeepWalk using Skip-gram with negative sampling (SGNS), since the variance of the stochastic updates turns out to be very large. In this work, we make progress on this question. We show that for "small norm" initialization, under a spectral gap assumption on the graph, the DeepWalk embeddings align with the column space of a fixed low-rank matrix. For graphs generated from Stochastic Block Models with certain separation conditions, our results imply that the DeepWalk embeddings recover cluster structure. To the best of our knowledge, our results give the first analysis of the optimization trajectory of DeepWalk with negative sampling on non-trivial graph classes.