Skip to yearly menu bar Skip to main content


Poster
in
Workshop: ES-FoMo II: 2nd Workshop on Efficient Systems for Foundation Models

OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training

Sami Jaghouar · Johannes Hagemann


Abstract:

OpenDiLoCo is an open-source implementation and replication of the Distributed Low-Communication (DiLoCo) training method for large language models. We provide a reproducible implementation of the DiLoCo experiments, offering it within a scalable, decentralized training framework using the Hivemind library. We demonstrate its effectiveness by training a model across two continents and four countries. Additionally, we conduct an analytical evaluation of its practicality, focusing on the algorithm's compute efficiency and scalability in the number of workers. Our findings indicate that while DiLoCo can be effective in specific scenarios, it is not necessarily a low-communication replacement for Distributed Data Parallel training due to its lower compute efficiency over a smaller number of steps.

Chat is not available.