Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Localized Learning: Decentralized Model Updates via Non-Global Objectives

Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation

Jiong Zhu · Aishwarya Naresh Reganti · Edward Huang · Charles Dickens · Nikhil Rao · Karthik Subbian · Danai Koutra

Keywords: [ model aggregation training ] [ scalability ] [ localized learning ] [ Graph Neural Networks ] [ Distributed Learning ]


Abstract:

Conventional distributed Graph Neural Network (GNN) training relies either on inter-instance communication or periodic fallback to centralized training, both of which create overhead and constrain their scalability. In this work, we propose a streamlined framework for distributed GNN training that eliminates these costly operations, yielding improved scalability, convergence speed, and performance over state-of-the-art approaches. Our framework (1) comprises independent trainers that asynchronously learn local models from locally-available parts of the training graph, and (2) synchronize these local models only through periodic (time-based) model aggregation. Contrary to prevailing belief, our theoretical analysis shows that it is not essential to maximize the recovery of cross-instance node dependencies to achieve performance parity with centralized training. Instead, our framework leverages randomized assignment of nodes or super-nodes (i.e., collections of original nodes) to partition the training graph to enhance data uniformity and minimize discrepancies in gradient and loss function across instances. Experiments on social and e-commerce networks with up to 1.3 billion edges show that our proposed framework achieves state-of-the-art performance and 2.31x speedup compared to the fastest baseline, despite using less training data.

Chat is not available.