Efficient Distributed MLLM Training with ModalGlue
Insu Jang ⋅ Runyu Lu ⋅ Nikhil Bansal ⋅ Ang Chen ⋅ Mosharaf Chowdhury
Abstract
Multimodal large language models (MLLMs) extend the capabilities of large language models (LLMs) by combining heterogeneous model architectures to handle diverse modalities like images and audio. However, this inherent heterogeneity in MLLM model structure and data types makes makeshift extensions to existing LLM training frameworks unsuitable for efficient MLLM training, especially in distributed training. In this paper, we present ModalGlue, an efficient distributed MLLM training framework that contemplates MLLM's unique characteristics in both model and data parallelization. ModalGlue introduces frozen-aware pipeline parallelism and workload-balanced context parallelism to improve MLLM training throughput. Our extensive evaluation shows that \name outperforms state-of-the-art solutions by $2.26\times$ on average in terms of MLLM training throughput.
Successful Page Load