Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities

Towards a Better Theoretical Understanding of Independent Subnetwork Training

Egor Shulgin · Peter Richtarik


Abstract:

Modern advancements in large-scale machine learning would be impossible without the paradigm of data-parallel distributed computing. Since distributed computing with large-scale models imparts excessive pressure on communication channels, a lot of recent research was directed towards co-designing communication compression strategies and training algorithms with the goal of reducing communication costs. While pure data parallelism allows better data scaling, it suffers from poor model scaling properties. Indeed, compute nodes are severely limited by memory constraints, preventing further increases in model size. For this reason, the latest achievements in training giant neural network models rely on some form of model parallelism as well. In this work, we take a closer theoretical look at Independent Subnetwork Training (IST), which is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication, and provide a precise analysis of its optimization performance on a quadratic model.

Chat is not available.