Efficient AI Deployment on Legacy Data Centers in the Global South
Abstract
Data centers in the Global South face a triple constraint: legacy CPU-centric infrastructure, limited accelerators, and heterogeneous hardware obtained from multiple donors/vendors due to economic aid fragmentation. This paper proposes a training–inference separated scheduling framework for such environments. For training, we dynamically select among data, model, and pipeline parallelism based on workload characteristics and available heterogeneous GPUs. For inference, we employ speculative decoding, continuous batching, and KV caching on CPU–GPU hybrids. We then show how FlagOS provides a unified execution layer that abstracts vendor differences (NVIDIA, AMD, Intel, Huawei Ascend, and edge TPUs), enabling seamless integration of donated hardware. Using a realistic simulation of a Nigerian university data center (320 CPU cores + mixed donated GPUs), our approach improves training throughput by 2.3× and reduces inference latency by 58% compared to naive GPU-only deployment, while cutting hardware acquisition costs by 70%.